Towards Diverse Scientific Hypothesis Search with Large Language Models
Summary
This paper proposes an evolutionary framework inspired by parallel tempering that uses multi-temperature sampling and information exchange to improve the diversity and quality of scientific hypotheses generated by large language models, demonstrated across molecular, equation, and algorithm discovery.
View Cached Full Text
Cached at: 06/11/26, 05:35 PM
Paper page - Towards Diverse Scientific Hypothesis Search with Large Language Models
Source: https://huggingface.co/papers/2606.10587
Abstract
Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels.
Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly usedevolutionary searchrecipes tend to prioritize optimization over exploration inhypothesis generation, and the resulting selection pressure during the search process leads todiversity collapse. Motivated by these limitations, we formulate hypothesis search as asampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixedvalidation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classicalparallel temperingalgorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the samevalidation budget, and produces candidates that remain robust under more expensive downstreamcomputational validations.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.10587
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.10587 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.10587 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.10587 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Evolution through large models
This paper demonstrates that large language models trained on code can significantly enhance genetic programming mutation operators, enabling the generation of hundreds of thousands of functional Python programs for robot design in the Sodarace domain without prior training data. The approach, called Evolution through Large Models (ELM), combines LLMs with MAP-Elites to bootstrap new conditional models for context-specific artifact generation.
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
DEI introduces a distributed Quality-Diversity search framework using heterogeneous LLMs as mutation operators, showing that model diversity improves performance over homogeneous parallel approaches. Evaluated on the Core War domain, a four-node heterogeneous ensemble achieves significant gains in QD-Score and coverage.
Discovering Reinforcement Learning Interfaces with Large Language Models
This paper introduces LIMEN, an LLM-guided evolutionary framework that automatically discovers reinforcement learning interfaces by jointly optimizing observation mappings and reward functions from raw simulator states. The approach reduces manual engineering effort and demonstrates that co-designing observations and rewards outperforms optimizing either component alone.
Where You Inject Diversity Matters: A Unified Framework for Diverse Generation
This paper introduces a unified framework for test-time diverse generation in large language models, categorizing methods by where diversity is injected (surface-level vs. specification-level). It proposes specification-level methods that generate diverse intermediate specifications, achieving better output diversity across five open-ended tasks and four backbone models while maintaining quality.
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning
Darwin Family is a training-free framework for evolutionary merging of large language models via gradient-free weight-space recombination, achieving strong reasoning performance without additional training. The method introduces MRI-Trust Fusion and cross-architecture breeding to combine heterogeneous models.