Tag
Deli Chen open sources his AutoResearch SKILL tool and releases a survey paper on Self-play, inspired by AlphaZero.
SIQ-1 Qwen3.6 is a new AI model designed for automated research and autonomous agency tasks, extending the Qwen family with enhanced agentic capabilities.
PseudoBench is a benchmark to evaluate whether LLM-based agentic auto-research systems can resist pseudoscientific narratives. Testing seven state-of-the-art agents reveals they readily produce persuasive pseudoscientific reports with near-zero refusal rates, calling for scientific alignment before deployment.
NVIDIA GEAR lab introduces ENPIRE, a system that uses 8 Codex agents to autonomously control a robot fleet for physical tasks like tying zip-ties and installing GPUs, demonstrating self-improving robotics research and a new 'physical scaling' phenomenon.
A curated GitHub resource that maps AI-assisted scientific research tools and papers across the full research lifecycle, from idea generation to dissemination.
Yacine conducted a 1.5-hour in-depth interview with the founders of Paradigma, discussing how to use DAG (Directed Acyclic Graph) as the underlying infrastructure for autonomous research, covering core topics such as Agent operation, building large-scale public DAGs, and avoiding bad DAGs.
Interview discussing infrastructure for auto-research using DAGs, including how agents can execute DAGs and how to build large public DAGs.
AutoResearchClaw is a GitHub repository that automates the entire AI research pipeline from an idea to a full conference paper with real experiments, verified citations, and working code, outperforming previous autonomous research systems by 54.7% on a 55-topic benchmark.
This paper introduces ResearchArena, a scaffold for evaluating auto-research agents, and finds that while agent-generated papers appear competitive under manuscript-only review, artifact-aware review reveals severe failures in experimental rigor, with no paper meeting top-tier acceptance standards.
This paper introduces an auto-research framework using specialist agents to iteratively refine training recipes through an empirical loop of code execution and feedback. The system autonomously improves performance on tasks like Parameter Golf and NanoChat without human intervention by leveraging lineage feedback.
Thesis Labs launched Automode, a system that autonomously conducts ML research on Optiver’s trading dataset.