@0xLogicrw: Former OpenAI post-training core member Jiayi Weng proposed a new reinforcement learning paradigm called "Heuristic Learning" in his personal capacity and open-sourced all experimental code. He used Codex (GPT-5.4) to repeatedly play the Atari game Breakout, but GPT-5.4 was never retrained...
Summary
Former OpenAI researcher Jiayi Weng proposed a new paradigm called "Heuristic Learning", which uses large language models to generate and iteratively modify Python code to solve reinforcement learning tasks. Knowledge is stored in interpretable code rather than neural network parameters, effectively avoiding catastrophic forgetting. It has achieved excellent results on Atari and MuJoCo benchmarks and the code has been open-sourced.
Similar Articles
@0xLogicrw: OpenAI post-training core member Weng Jiayi previously proved that 'purely relying on a large model to write code can beat Atari games.' Fluid dynamics PhD student Paul Garnier has now brought this approach to the more hardcore field of fluid dynamics control. He never trained any neural network. He simply had Codex 5.5 act as a programmer...
A fluid dynamics PhD student used OpenAI's Codex 5.5 model to achieve fluid dynamics control purely through code generation, without training any neural network. It surpassed reinforcement learning baselines in multiple tests, with low cost and interpretable results.
@Gracker_Gao: AI Papers: Strong AI Doesn't Write Code by Writing Code Two recent arXiv papers reveal a counterintuitive finding: when encountering an unfamiliar programming language, GPT-5.4 and Claude Opus 4.6 don't directly write code in the target language—instead, they write a Python program to generate the target code, then debug it locally. This "meta-…
Two recent arXiv papers found that GPT-5.4 and Claude Opus 4.6 employ a metaprogramming strategy when handling unfamiliar programming languages — generating target code with Python and debugging locally — rather than writing the target language code directly. This strategy is key to distinguishing top-tier agents from average ones, and strategy sophistication matters more than model parameter scale.
@MaxForAI: Tian Yuandong @tydsh's startup team Recursive @Recursive_SI released a milestone: an automated AI research system. In this system, AI can complete the entire research loop of 'propose ideas → implement → run experiments → verify → select next experiment based on results'. Results show that with clear objectives...
The Recursive team released an automated AI research system that can autonomously complete the research loop, surpassing existing human community solutions on multiple benchmarks. For example, on NanoGPT Speedrun it compressed training time from 79.7 seconds to 77.5 seconds, and on SOL-ExecBench it improved the score to 0.754.
@FinanceYF5: 2/ His name is Lenny Bogdonoff. He joined OpenAI when it only had 250 people, while GPT-4 was still being trained and ChatGPT hadn't launched yet. His first task: rebuilding the Jupyter code execution environment, which later became the prototype for the 'AI computer' concept. He didn't realize how important this was, and most people didn't either.
Lenny Bogdonoff, an early OpenAI employee, rebuilt the Jupyter code execution environment before GPT-4 training and ChatGPT launch. This work became the prototype for the later 'AI computer' concept, but it wasn't recognized at the time.
@vintcessun: This project is insane — it builds GPT behind ChatGPT from scratch in a way even a kid can understand. Every line of code is commented, 12 chapters over 7500 lines, and it even explains the attention mechanism details that I could never figure out. Simply put, if you want to 'understand' rather than 'import packages' for LLM, this is the most beginner-friendly hands-on tutorial right now.
A 12-chapter interactive textbook that teaches how to build a GPT-like language model from absolute scratch, with fully annotated code and beginner-friendly explanations.