Tag
Fulcrum Research introduces Inverse Rubric Optimization (IRO), a testbed for studying long-horizon agent behavior where agents must optimize the preferences of a black-box judge. The approach enables smooth scaling and rich behavior analysis, with experiments showing frontier models like Fable 5 and Opus 4.6 have different scaling characteristics.