Tag
Announcement of the LLMs for Scientific Discovery workshop at COLM 2026 in San Francisco, with a call for papers due June 23 and a request for reviewers.
Researchers at MIT present a paper on self-evolving AI scientists that can discover and adapt their own scientific vocabulary, using a categorical framework to mathematically quantify genuine novelty and separate discovery from mere search or retrieval.
An OpenAI model found a counterexample to an 80-year-old Erdős conjecture, with researchers sharing the story on the OpenAI Podcast about how AI and mathematicians can collaborate on mathematical discoveries.
Google DeepMind has open-sourced Science Skills, a collection of agent skills for scientific research tasks including genomics, structural biology, and cheminformatics, to accelerate agentic workflows with scientific grounding and higher token efficiency.
Ex-DeepMind researchers raised $50M for Inherent, building a platform called Faraday that uses self-improving AI to determine which scientific questions are worth asking, aiming to enable discoveries beyond human reach.
This article argues that while AI excels at pattern recognition and hypothesis generation, scientific and economic progress requires grounded interaction with reality and institutional execution, emphasizing the need for human-AI collaboration.
EvoSci proposes a bio-inspired multi-agent framework that integrates evolutionary algorithms with knowledge graph modeling to iteratively generate, evaluate, and refine research ideas, achieving top performance in peer-review evaluations.
LLM-AutoSciLab is a closed-loop framework that uses LLMs to iteratively generate hypotheses, select informative experiments, and refine mechanisms, achieving superior accuracy and sample efficiency on physics and biology benchmarks over prior static methods.
The paper introduces the Multi-Persona Debate System (MPDS), a literature-grounded framework that uses LLMs, persona induction, and structured multi-agent debate to automate the generation of scientific hypotheses, with evaluations in battery materials research showing improved hypothesis quality and cross-perspective integration.
Introduces The Singularity Gate, a benchmark to test if frontier AI models can predict paradigm-shifting scientific discoveries published after their training cutoff. Current top score is 17.75% partial credit, 0% fully correct.
This survey examines the emerging field of AI-powered research automation (AutoResearch), analyzing how AI systems are moving from isolated task assistance to full workflow-level scientific discovery. It defines a spectrum from human-steered 'Vibe Research' to AI-led systems, and proposes five evaluation dimensions for scientific credibility.
A new preprint introduces the concept of 'Alien Space of Science' – research directions that are coherent but cognitively unavailable to current communities – and proposes a method to sample such directions using idea atoms from LLM papers, showing it can explore 3.5-7x broader idea spaces without sacrificing coherence.
A paper proves that all elementary functions like sin, exp, log, sqrt can be generated from a single binary operator eml(x,y)=exp(x)-ln(y), similar to how NAND gates unify digital logic. This could simplify AI architectures by enabling a single trainable node for continuous mathematics.
This paper explores teaching language models to forecast the empirical success of research ideas by comparing pairs of ideas. Using a dataset of 11,488 idea pairs from PapersWithCode, the authors show that fine-tuning (SFT) boosts accuracy to 77.1%, outperforming GPT-5, and reinforcement learning with verifiable rewards achieves 71.35% with interpretable reasoning.
A survey paper examining the transition of AI from task-specific assistants to workflow-level research automators, defining AutoResearch as the spectrum of AI-powered scientific workflow automation and analyzing challenges in autonomy, reproducibility, and accountability.
Introduces ArtifactLinker, a framework that models HuggingFace as an artifact graph and uses GNNs and LLM agents to automatically discover state-of-the-art models and research insights.
Google announces Empirical Research Assistance (ERA), an AI tool using Gemini to write and optimize scientific code, now published in Nature and being rolled out as part of Gemini for Science to help scientists worldwide accelerate computational discovery.
This paper proposes a scalable supervised fine-tuning method for training language models to propose research hypotheses across disciplines. It has been accepted by ICML 2026 and the code is open source.
AutoResearchClaw is a multi-agent autonomous research system that improves scientific discovery through structured debate, self-healing execution, and human collaboration, outperforming previous systems on the ARC-Bench benchmark by 54.7%.
This paper presents a case study using an LLM-driven tree search algorithm (ERA) combined with a coding agent (AntiGravity) to autonomously generate high-efficiency three-dimensional photovoltaic structures, overcoming limitations of flat solar panels at mid-latitudes. The workflow includes iterative patching to eliminate reward hacking and discovers improved designs under various constraints.