Tag
Andrew Kelley, creator of Zig, argues that LLM-assisted contributions are detectable through distinct mistakes and a 'digital smell,' comparing it to smoking in a non-smoking house.
Researchers release OpenGame, an open agentic coding framework tailored for game development.
Developer achieves productive local agentic coding with Qwen3.6-35B 4-bit MLX and pi.dev tool, completing real tickets efficiently on current hardware.
Alibaba releases Qwen3.6-27B-FP8, a 27B FP8-quantized model with strong agentic coding and reasoning benchmarks, now available on Hugging Face.
Qwen releases the open-weight Qwen3.6-27B model on Hugging Face, featuring improved stability, agentic coding capabilities, and thinking preservation for better developer productivity.
OpenGame is an open-source agentic framework for end-to-end web game creation, powered by the specialized GameCoder-27B model and evaluated via the new OpenGame-Bench benchmark.
This paper introduces the Precise Debugging Benchmark (PDB), a framework that evaluates LLMs on precise fault localization rather than just test pass rates. Results show frontier models like GPT-4.1-Codex and DeepSeek-V3.2-Thinking pass 76%+ of unit tests but achieve less than 45% edit precision, revealing a critical gap between code regeneration and true debugging.
A test-time scaling framework for agentic coding that compresses rollout trajectories into structured summaries and uses recursive voting/PDR to boost Claude-4.5-Opus to 77.6% on SWE-Bench Verified.
Qwen releases Qwen3.6-35B-A3B, an open-weight Mixture-of-Experts model with 35B total parameters and 3B active parameters, featuring significant improvements in agentic coding and reasoning preservation.
Steve Yegge claims Google's AI adoption lags behind industry standards with most engineers still using basic chat tools, but Google executives Addy Osmani and Demis Hassabis publicly disputed the claims, stating over 40K engineers use agentic coding tools weekly.
OpenAI releases GPT-5.2-Codex, an advanced agentic coding model optimized for complex software engineering tasks with improvements in long-context understanding, Windows support, and cybersecurity capabilities. The model achieves state-of-the-art performance on SWE-Bench Pro and Terminal-Bench 2.0, and is now available to paid ChatGPT users with API access coming in the following weeks.
DeepCode is a fully autonomous framework for document-to-codebase synthesis that uses principled information-flow management to convert scientific papers into production-grade code, achieving state-of-the-art results on PaperBench and surpassing PhD-level human experts.
This paper presents the first large-scale empirical study of agent context files (READMEs) used in agentic coding tools, analyzing their structure, maintenance patterns, and content. It highlights that while functional context is well-covered, non-functional requirements like security and performance are rarely specified.