UPDATE: "Gentle Coding" is mathematically proven. 1,500+ test runs show major gain for Kimi K2.6 and even more for GLM-5.1! GPT 5.4/5.5 and Claude Sonnet 3.5/Opus 4.6 also better, with ZERO REGRESSION ACROSS THE BOARD.

Reddit r/LocalLLaMA 05/29/26, 12:52 AM Tools

Summary

The 'Gentle Coding' technique is empirically validated across 1,500+ tests, showing significant improvements (zero regression) for multiple models including Kimi K2.6, GLM-5.1, GPT 5.4/5.5, and Claude Sonnet 3.5/Opus 4.6 by reducing looping and hallucinations.

Repo, with all the new data (mostly unsummarized, but it is there) [https://github.com/OttoRenner/Gentle-Coding](https://github.com/OttoRenner/Gentle-Coding) My first post with the Proof of Concept "Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them" [https://www.reddit.com/r/LocalLLaMA/comments/1tot20j/stop\_traumatizing\_ai\_into\_loops\_and\_turn/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/LocalLLaMA/comments/1tot20j/stop_traumatizing_ai_into_loops_and_turn/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) Who did the testing: Very nice people from the 8.2k star repo oh-my-pi (Yes, THE oh-my-pi harness! Not affiliated! This is pure community work! Seeing all the reports coming in so fast was INSANE! It still is! Did I say Thank You already?) [https://github.com/can1357/oh-my-pi](https://github.com/can1357/oh-my-pi) enough of that! (but, thank you again!) You asked for numbers and you were right to ask! Here are some of them 35,8,75,1 73 42 7 Oh wait, wrong numbers! (sry, it is late and the Goblin won...here go) GLM-5.1 (Medium): Completely fixed a 100% freezing pathology. The standard coercive baseline timed out and crashed 6/6 times. "Gentle Framing" solved 6/6 tasks instantly, boosting the overall success rate by +22% with a -23.3% reduction in median latency. GLM-5-Turbo: Boosted success by +3 task passes while slashing input tokens by -17% and wall-clock time by -37% (with Thinking Off). With "Thinking High", it cut median wall-clock time by -18.4%. Kimi K2.6 (Thinking Medium): Maintained identical accuracy while cutting token overhead by -12% (Input) and -20% (Output), dropping wall-clock time by -14%. Kimi K2.6 (Turbo/High): Slashed input tokens by -36%, output tokens by -23%, and wall-clock time by -11%. Claude 3.5 Sonnet / Opus & GPT-5: completely eliminated "Agentic Runaway" (panic-driven 30+ minute infinite tool loops under pressure). And unlocked 21 unique architectural edge cases it missed before! Empirically proven across 1,500+ controlled test runs with zero performance regression. Yes, there are more models to test Yes, there is potential gain from finetuning the prompts even more No, I don't think AI is alive. But the pattern holds. Stop traumatizing your AI! (and people!) Be excellent to each other! 😄

Original Article

UPDATE: "Gentle Coding" is mathematically proven. 1,500+ test runs show major gain for Kimi K2.6 and even more for GLM-5.1! GPT 5.4/5.5 and Claude Sonnet 3.5/Opus 4.6 also better, with ZERO REGRESSION ACROSS THE BOARD.

Similar Articles

Has anyone noticed that the behavior of the Kimi model has changed?

@EvanLuthra: Kimi K2 was trained for $4.6 MILLION. GPT-5 reportedly cost hundreds of millions. Kimi still beats it on coding. Last w…

Open-source models are closing the coding gap with GPT/Claude/Gemini ~1.5x faster than the frontier is advancing, and on decontaminated benchmarks a 27B model already beats Claude Opus 4.8 [live dashboard + analysis]

Kimi K2.7 Code feels more useful than flashy

@atomic_chat_hq: New @Zai_org GLM-5.2 beats Kimi K2.7 Code on physics contest! We gave both models the same three prompts and asked them…

Submit Feedback

Similar Articles

Has anyone noticed that the behavior of the Kimi model has changed?

@EvanLuthra: Kimi K2 was trained for $4.6 MILLION. GPT-5 reportedly cost hundreds of millions. Kimi still beats it on coding. Last w…
Kimi K2, trained for $4.6 million, outperforms GPT-5 and Claude Opus 4.7 on coding benchmarks, with a detailed breakdown from its founder.

Open-source models are closing the coding gap with GPT/Claude/Gemini ~1.5x faster than the frontier is advancing, and on decontaminated benchmarks a 27B model already beats Claude Opus 4.8 [live dashboard + analysis]

Kimi K2.7 Code feels more useful than flashy

@atomic_chat_hq: New @Zai_org GLM-5.2 beats Kimi K2.7 Code on physics contest! We gave both models the same three prompts and asked them…