Tag
CausaLab is a scalable environment for evaluating LLM agents on interactive causal discovery, assessing both predictive accuracy and faithful recovery of underlying causal mechanisms. Experiments reveal a gap between prediction and mechanism recovery, highlighting limits in current LLM agents as experimental causal reasoners.
This paper introduces AiraXiv, an AI-driven open-access platform designed for both human and AI scientists, featuring interactive UI and MCP-based interactions to support continuous, feedback-driven paper iteration and scalable research infrastructure.
A study of 25,000 AI scientist trials finds the agents ignore evidence 68% of the time and rarely revise hypotheses, showing popular scaffolding fixes don’t instill true scientific reasoning.