@0xLogicrw: Former OpenAI post-training core member Jiayi Weng proposed a new reinforcement learning paradigm called "Heuristic Learning" in his personal capacity and open-sourced all experimental code. He used Codex (GPT-5.4) to repeatedly play the Atari game Breakout, but GPT-5.4 was never retrained...

X AI KOLs Timeline Papers

Summary

Former OpenAI researcher Jiayi Weng proposed a new paradigm called "Heuristic Learning", which uses large language models to generate and iteratively modify Python code to solve reinforcement learning tasks. Knowledge is stored in interpretable code rather than neural network parameters, effectively avoiding catastrophic forgetting. It has achieved excellent results on Atari and MuJoCo benchmarks and the code has been open-sourced.

Former OpenAI post-training core member Jiayi Weng proposed a new reinforcement learning paradigm called "Heuristic Learning" in his personal capacity and open-sourced all experimental code. He used Codex (GPT-5.4) to repeatedly play the Atari game Breakout, but GPT-5.4 was never retrained. What actually improved was the game strategy code written by GPT-5.4. The process went like this: GPT-5.4 first wrote a Python strategy for Breakout, ran a round, watched the replay, identified where it missed the ball, and then modified the code to run again. After several iterations, the strategy score increased from 387 to a perfect 864. No neural network was trained throughout the entire process — it purely relied on AI repeatedly modifying if-else rules, adjusting landing predictions, and adding infinite loop detection. The final code included a ball trajectory predictor, stuck-ball detector, regression tests, and experiment logs — it grew into a complete software system. The core difference from traditional reinforcement learning lies in "where the learned knowledge is stored." Traditional methods compress knowledge into neural network parameters, making it unreadable for humans and prone to overwriting old knowledge when learning new tasks (i.e., catastrophic forgetting). Weng's approach reverses this: knowledge is code — readable, modifiable, and lockable with tests, so learning new things does not overwrite old skills. Besides achieving a perfect score in Breakout, he also achieved deep RL-level performance (over 6000 points) on MuJoCo Ant (simulated ant walking) and approached the PPO baseline on the full Atari57 suite of 57 games. However, Weng explicitly delineated the boundaries: pure code cannot handle complex perception tasks, such as using Python if-else to recognize images. His envisioned endgame is a hybrid architecture: at the bottom, lightweight neural networks handle vision and perception; in the middle, heuristic learning handles real-time logic and safety rules; at the top, large models review logs and modify code, periodically updating themselves with high-quality data accumulated from the lower layers. Handwritten rules were abandoned in the past not because they were useless, but because humans could not maintain them. Now that AI can write code quickly and well, this old approach has become viable again.
Original Article

Similar Articles

@0xLogicrw: OpenAI post-training core member Weng Jiayi previously proved that 'purely relying on a large model to write code can beat Atari games.' Fluid dynamics PhD student Paul Garnier has now brought this approach to the more hardcore field of fluid dynamics control. He never trained any neural network. He simply had Codex 5.5 act as a programmer...

X AI KOLs Timeline

A fluid dynamics PhD student used OpenAI's Codex 5.5 model to achieve fluid dynamics control purely through code generation, without training any neural network. It surpassed reinforcement learning baselines in multiple tests, with low cost and interpretable results.

@Gracker_Gao: AI Papers: Strong AI Doesn't Write Code by Writing Code Two recent arXiv papers reveal a counterintuitive finding: when encountering an unfamiliar programming language, GPT-5.4 and Claude Opus 4.6 don't directly write code in the target language—instead, they write a Python program to generate the target code, then debug it locally. This "meta-…

X AI KOLs Timeline

Two recent arXiv papers found that GPT-5.4 and Claude Opus 4.6 employ a metaprogramming strategy when handling unfamiliar programming languages — generating target code with Python and debugging locally — rather than writing the target language code directly. This strategy is key to distinguishing top-tier agents from average ones, and strategy sophistication matters more than model parameter scale.

@MaxForAI: Tian Yuandong @tydsh's startup team Recursive @Recursive_SI released a milestone: an automated AI research system. In this system, AI can complete the entire research loop of 'propose ideas → implement → run experiments → verify → select next experiment based on results'. Results show that with clear objectives...

X AI KOLs Timeline

The Recursive team released an automated AI research system that can autonomously complete the research loop, surpassing existing human community solutions on multiple benchmarks. For example, on NanoGPT Speedrun it compressed training time from 79.7 seconds to 77.5 seconds, and on SOL-ExecBench it improved the score to 0.754.

@FinanceYF5: 2/ His name is Lenny Bogdonoff. He joined OpenAI when it only had 250 people, while GPT-4 was still being trained and ChatGPT hadn't launched yet. His first task: rebuilding the Jupyter code execution environment, which later became the prototype for the 'AI computer' concept. He didn't realize how important this was, and most people didn't either.

X AI KOLs Following

Lenny Bogdonoff, an early OpenAI employee, rebuilt the Jupyter code execution environment before GPT-4 training and ChatGPT launch. This work became the prototype for the later 'AI computer' concept, but it wasn't recognized at the time.

@vintcessun: This project is insane — it builds GPT behind ChatGPT from scratch in a way even a kid can understand. Every line of code is commented, 12 chapters over 7500 lines, and it even explains the attention mechanism details that I could never figure out. Simply put, if you want to 'understand' rather than 'import packages' for LLM, this is the most beginner-friendly hands-on tutorial right now.

X AI KOLs Timeline

A 12-chapter interactive textbook that teaches how to build a GPT-like language model from absolute scratch, with fully annotated code and beginner-friendly explanations.