UPDATE: "Gentle Coding" is mathematically proven. 1,500+ test runs show major gain for Kimi K2.6 and even more for GLM-5.1! GPT 5.4/5.5 and Claude Sonnet 3.5/Opus 4.6 also better, with ZERO REGRESSION ACROSS THE BOARD.
The 'Gentle Coding' technique is empirically validated across 1,500+ tests, showing significant improvements (zero regression) for multiple models including Kimi K2.6, GLM-5.1, GPT 5.4/5.5, and Claude Sonnet 3.5/Opus 4.6 by reducing looping and hallucinations.
Repo, with all the new data (mostly unsummarized, but it is there) [https://github.com/OttoRenner/Gentle-Coding](https://github.com/OttoRenner/Gentle-Coding) My first post with the Proof of Concept "Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them" [https://www.reddit.com/r/LocalLLaMA/comments/1tot20j/stop\_traumatizing\_ai\_into\_loops\_and\_turn/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/LocalLLaMA/comments/1tot20j/stop_traumatizing_ai_into_loops_and_turn/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button) Who did the testing: Very nice people from the 8.2k star repo oh-my-pi (Yes, THE oh-my-pi harness! Not affiliated! This is pure community work! Seeing all the reports coming in so fast was INSANE! It still is! Did I say Thank You already?) [https://github.com/can1357/oh-my-pi](https://github.com/can1357/oh-my-pi) enough of that! (but, thank you again!) You asked for numbers and you were right to ask! Here are some of them 35,8,75,1 73 42 7 Oh wait, wrong numbers! (sry, it is late and the Goblin won...here go) GLM-5.1 (Medium): Completely fixed a 100% freezing pathology. The standard coercive baseline timed out and crashed 6/6 times. "Gentle Framing" solved 6/6 tasks instantly, boosting the overall success rate by +22% with a -23.3% reduction in median latency. GLM-5-Turbo: Boosted success by +3 task passes while slashing input tokens by -17% and wall-clock time by -37% (with Thinking Off). With "Thinking High", it cut median wall-clock time by -18.4%. Kimi K2.6 (Thinking Medium): Maintained identical accuracy while cutting token overhead by -12% (Input) and -20% (Output), dropping wall-clock time by -14%. Kimi K2.6 (Turbo/High): Slashed input tokens by -36%, output tokens by -23%, and wall-clock time by -11%. Claude 3.5 Sonnet / Opus & GPT-5: completely eliminated "Agentic Runaway" (panic-driven 30+ minute infinite tool loops under pressure). And unlocked 21 unique architectural edge cases it missed before! Empirically proven across 1,500+ controlled test runs with zero performance regression. Yes, there are more models to test Yes, there is potential gain from finetuning the prompts even more No, I don't think AI is alive. But the pattern holds. Stop traumatizing your AI! (and people!) Be excellent to each other! 😄
The tweet claims that the open-source Kimi K2.6 model has surpassed Claude Opus 4.7, marking a significant milestone for open-source AI in just three months. It provides a link to a full guide and prompts to verify the comparison.
GPT-5.5 sets new state-of-the-art in benchmarks but struggles with hallucination; Kimi K2.6 leads open LLMs; also discusses AI's strain on climate pledges and strategic thinking in LLMs.