Tag
This paper benchmarks agentic AI systems on the task of loading, understanding, and reformatting fragmented neuroscience data, finding that while agents perform well on subtasks, they rarely achieve fully error-free end-to-end solutions and human oversight remains necessary.