@0xLogicrw: Former OpenAI post-training core member Jiayi Weng proposed a new reinforcement learning paradigm called "Heuristic Learning" in his personal capacity and open-sourced all experimental code. He used Codex (GPT-5.4) to repeatedly play the Atari game Breakout, but GPT-5.4 was never retrained...
Summary
Former OpenAI researcher Jiayi Weng proposed a new paradigm called "Heuristic Learning", which uses large language models to generate and iteratively modify Python code to solve reinforcement learning tasks. Knowledge is stored in interpretable code rather than neural network parameters, effectively avoiding catastrophic forgetting. It has achieved excellent results on Atari and MuJoCo benchmarks and the code has been open-sourced.
Similar Articles
Building more with GPT-5.1-Codex-Max
OpenAI introduces GPT-5.1-Codex-Max, a new agentic coding model with improved reasoning, token efficiency, and the ability to maintain coherent work across millions of tokens through a 'compaction' mechanism. The model is faster, more intelligent, and can sustain long-running tasks for hours or days, representing a significant advancement in AI-assisted software engineering.
@qloog: Stop calling AI a mere efficiency booster. This OpenAI-endorsed Codex tutorial lets one person do an entire team’s job—iOS app, code, investor deck—end-to-end. Two levers: custom skills (reusable know-how) + automation (exponential speed).
OpenAI-endorsed Codex tutorial enables solo developers to build iOS apps, write code, and generate investor decks through reusable custom skills and automation.
Introducing GPT-5.3-Codex
OpenAI introduces GPT-5.3-Codex, an advanced agentic coding model that combines frontier coding capabilities with reasoning and professional knowledge, achieving state-of-the-art performance on SWE-Bench Pro and Terminal-Bench while being 25% faster than its predecessor.
Coding and design with GPT-5
OpenAI announces GPT-5 capabilities for coding and design tasks, demonstrating advanced applications of the latest model across software development and creative design workflows.
Addendum to GPT-5 system card: GPT-5-Codex
OpenAI has released GPT-5-Codex, a version of GPT-5 optimized for agentic coding tasks, trained with reinforcement learning on real-world coding environments. It is available via Codex CLI, IDE extensions, GitHub, and ChatGPT mobile, with comprehensive safety measures including sandboxing and prompt injection mitigations.