Tag
The article proposes using coding agents to maintain and iterate a system of programmatic strategies to replace neural network gradient updates. This approach achieved baseline performance in Deep RL tests and is considered a potential new paradigm following pre-training and RLHF.
Former OpenAI researcher Jiayi Weng proposed a new paradigm called "Heuristic Learning", which uses large language models to generate and iteratively modify Python code to solve reinforcement learning tasks. Knowledge is stored in interpretable code rather than neural network parameters, effectively avoiding catastrophic forgetting. It has achieved excellent results on Atari and MuJoCo benchmarks and the code has been open-sourced.