scientific-validity

Tag

Cards List
#scientific-validity

Search Discipline for Long-Horizon Research Agents

arXiv cs.AI · 14h ago Cached

This paper identifies a failure mode in long-horizon research agents where optimizing an aggregate metric can select candidates that improve the headline number but break critical subgroups (inversion). It proposes a search-discipline protocol with an external control loop that audits candidates based on disaggregated behavior rather than the score.

0 favorites 0 likes
← Back to home

Submit Feedback