Tag
This paper formalizes the concept of signed compression progress on a sealed audit as a reward that is Goodhart-resistant, proving that cumulative reward telescopes to genuine audit improvement and providing bounds for finite audit panels. It identifies failure modes and validates results with experiments.
A developer details the creation of LIA, an AI that runs continuously on a Linux system with its own directory, creates files autonomously, and operates based on intrinsic responsibility rather than prompts or RLHF; a preprint on SSRN and 12,000+ lines of custom Python code are provided.
OpenAI presents a large-scale empirical study of curiosity-driven reinforcement learning without extrinsic rewards across 54 benchmark environments, showing strong performance and investigating the role of feature spaces in prediction-based reward signals.