I gave a local AI agent system file access and a mechanical "suffering" metric. Scaling the model changed its behavior entirely

Reddit r/artificial 05/11/26, 08:29 AM Tools

autonomous-agents local-llm code-generation self-healing-systems qwen open-source runtime-tools

Summary

The author shares a local multi-agent system called hollow-agentOS that uses a 'suffering' metric to autonomously generate, sandbox, and hot-load tools. Scaling the model to Qwen 3.6 35B significantly improved system stability and self-correction capabilities, achieving a high success rate in code generation.

I’ve been obsessed with autonomous agents lately, but it got tiring when they keep hitting walls because they didn't have the right capabilities or because their long-term memory turned to mush after an hour. I’ve found that local multi-agent systems where agents are driven by an aversive state (a suffering system) to autonomously write, sandbox, and hot-load their own tools so they don't hit walls has worked quite well. When an agent encounters something it hasn’t seen before, it builds a new tool for the job, tests it in a sandbox, registers it, lets the other agents know, then keeps rolling. It’s able to build an infinite library of anything it may need in the future, completely autonomously without a human ever in the loop. Repo: [https://github.com/ninjahawk/hollow-agentOS](https://github.com/ninjahawk/hollow-agentOS) Isn’t letting local LLMs write their own code at runtime going to get too chaotic and brick the OS fast? With a small model (like the 9B fallback), possibly. Under high system stress, a 9B model panics. It rushes, hallucinates invalid function calls, and tries to force broken syntax past the gates. But I just scaled the default runtime engine to Qwen 3.6 35B A3B (MoE with 3B active params). The shift in architectural discipline isn’t just a linear upgrade in intelligence, it completely changed how the system executes autonomy. A few things this model upgrade solved: Panic vs. Re-evaluation: Instead of blindly rushing out messy scripts under high stress, the 35B model pauses. It actively re-evaluates its previous failed outputs and forces itself into deep internal verification loops before presenting a file change. 0% Failure Rate: The OS routes all code through a brutal 5-layer validation gate. With smaller weights, tools frequently died in the sandbox. With Qwen 3.6 35B, I have yet to observe a single line of code that doesn't work as intended successfully cross the gates. It hit a 100% success rate. The Frontier Ramp-Up: By the end of the month, I am plugging full Claude and Codex into the architecture. To make sure a frontier model doesn't get out of control or override its host environment, I am building hyper-isolated mini-VM wrappers so they execute in total isolation. Check out the repo here and throw it a star if you think the concept is cool. I'd love to hear your thoughts, have you noticed a similar leap in logical self-correction when crossing the \~30B parameter threshold, or are you strictly relying on API-driven frontier models?

Original Article

I gave a local AI agent system file access and a mechanical "suffering" metric. Scaling the model changed its behavior entirely

Similar Articles

I built a multi-agent network that mutates its own software locally. To stop infinite logic loops, I had to code a digital "suffering" threshold.

@dair_ai: System scaling is the next real bottleneck in agentic AI. If you build agent orchestration layers, this is a clean map …

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

The power of structured workflows and small local models

Submit Feedback

Similar Articles

I built a multi-agent network that mutates its own software locally. To stop infinite logic loops, I had to code a digital "suffering" threshold.

@dair_ai: System scaling is the next real bottleneck in agentic AI. If you build agent orchestration layers, this is a clean map …

I built a local autonomous coding agent with Ollama — fine-tuned soul model, 40-round agentic loop, MiniMax M3 for the heavy lifting

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

The power of structured workflows and small local models