I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

Reddit r/LocalLLaMA 05/18/26, 06:38 AM Tools

coding-agent open-source local-models small-models benchmark code-generation developer-tool

Summary

The author built SmallCode, a coding agent optimized for small local models, achieving 87% benchmark success with a 4B parameter model using techniques like compound tools, improvement loops, and token budgeting.

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I built SmallCode. It's designed from the ground up for small local models. **The result:** 87/100 benchmark tasks pass with a Gemma 4 model that only activates 4B parameters per token. OpenCode scores \~75% with 14B models. The harness does the heavy lifting, not the model size. **How it works (the tricks that make small models reliable):** * **Compound tools:** Instead of making the model chain 4 tool calls (find file → read file → edit file → verify), SmallCode gives it one tool that does all 4. Small models lose coherence after 3+ sequential calls. This cuts failures in half. * **Improvement loop:** Every time the model writes code, SmallCode instantly compiles/lints it. If it fails, it feeds the errors back automatically. The model doesn't need to be smart enough to get it right first try — it just needs to fix errors when shown them. * **Decompose on failure:** If the model fails the same thing twice, SmallCode stops retrying and instead breaks the problem into smaller pieces. "Fix this 200-line file" becomes "fix line 45 only." * **Escalation:** If even decompose fails and you have a Claude/OpenAI key configured, it auto-escalates to the bigger model for just that one task. You stay local 95% of the time, cloud 5%. * **Token budgeting:** Small models have 32k-256k context. SmallCode never dumps a whole file in. It summarizes, truncates, and manages every token so the model never sees "..." truncation in the middle of important code. * **Code graph:** Instead of grep-searching your codebase, SmallCode indexes your code into a symbol graph (functions, classes, who-calls-what). When you ask "how does auth work," it walks the graph and returns just the relevant connected code — not 15 random file snippets. **What it looks like:** Full-screen terminal UI (like OpenCode/vim), scrollable chat, command palette with `/`, plugin system, persistent memory across sessions. **What it doesn't do:** * No LSP integration (yet) * No multi-session (yet) * No desktop app * Doesn't compete with Claude Code for frontier model users **Install:** npm install -g smallcode cd your-project smallcode Point it at LM Studio, Ollama, or any OpenAI-compatible endpoint. MIT licensed, everything's on GitHub: [https://github.com/Doorman11991/smallcode](https://github.com/Doorman11991/smallcode) Happy to answer questions about the architecture or benchmark methodology.

Original Article

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

Similar Articles

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

I rebuilt a Claude Code–style coding agent from scratch — the whole agent loop is 6 lines. 20 chapters, ~5k lines, no frameworks, runs on local models too

AA introduces Coding Agent Index - Performance Comparisons between Model & Harness Combinations

EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming

Submit Feedback

Similar Articles

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

I rebuilt a Claude Code–style coding agent from scratch — the whole agent loop is 6 lines. 20 chapters, ~5k lines, no frameworks, runs on local models too

AA introduces Coding Agent Index - Performance Comparisons between Model & Harness Combinations

EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming