Towards Diverse Scientific Hypothesis Search with Large Language Models

Hugging Face Daily Papers Papers

Summary

This paper proposes an evolutionary framework inspired by parallel tempering that uses multi-temperature sampling and information exchange to improve the diversity and quality of scientific hypotheses generated by large language models, demonstrated across molecular, equation, and algorithm discovery.

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.
Original Article
View Cached Full Text

Cached at: 06/11/26, 05:35 PM

Paper page - Towards Diverse Scientific Hypothesis Search with Large Language Models

Source: https://huggingface.co/papers/2606.10587

Abstract

Evolutionary framework for hypothesis generation that improves diversity and quality through multi-temperature sampling and information exchange across search levels.

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly usedevolutionary searchrecipes tend to prioritize optimization over exploration inhypothesis generation, and the resulting selection pressure during the search process leads todiversity collapse. Motivated by these limitations, we formulate hypothesis search as asampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixedvalidation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classicalparallel temperingalgorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the samevalidation budget, and produces candidates that remain robust under more expensive downstreamcomputational validations.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2606\.10587

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.10587 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.10587 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.10587 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Evolution through large models

OpenAI Blog

This paper demonstrates that large language models trained on code can significantly enhance genetic programming mutation operators, enabling the generation of hundreds of thousands of functional Python programs for robot design in the Sodarace domain without prior training data. The approach, called Evolution through Large Models (ELM), combines LLMs with MAP-Elites to bootstrap new conditional models for context-specific artifact generation.

DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

Hugging Face Daily Papers

DEI introduces a distributed Quality-Diversity search framework using heterogeneous LLMs as mutation operators, showing that model diversity improves performance over homogeneous parallel approaches. Evaluated on the Core War domain, a four-node heterogeneous ensemble achieves significant gains in QD-Score and coverage.

Discovering Reinforcement Learning Interfaces with Large Language Models

Hugging Face Daily Papers

This paper introduces LIMEN, an LLM-guided evolutionary framework that automatically discovers reinforcement learning interfaces by jointly optimizing observation mappings and reward functions from raw simulator states. The approach reduces manual engineering effort and demonstrates that co-designing observations and rewards outperforms optimizing either component alone.

Where You Inject Diversity Matters: A Unified Framework for Diverse Generation

arXiv cs.CL

This paper introduces a unified framework for test-time diverse generation in large language models, categorizing methods by where diversity is injected (surface-level vs. specification-level). It proposes specification-level methods that generate diverse intermediate specifications, achieving better output diversity across five open-ended tasks and four backbone models while maintaining quality.