Scalable Inference-Time Annealing with Surrogate Likelihood Estimators

Hugging Face Daily Papers 06/01/26, 12:00 AM Papers

Summary

SITA (Scalable Inference-Time Annealing) introduces a method for efficiently sampling molecular Boltzmann distributions by retraining flow-based models along a temperature ladder using energy-based surrogate likelihoods, avoiding costly divergence computations. The approach achieves state-of-the-art performance on Alanine Dipeptide and Tripeptide benchmarks.

A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git

Original Article

Similar Articles

Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo

arXiv cs.LG

This paper proposes new discrete-time approximations for stochastic gradient Langevin dynamics (SGLD) with and without momentum, enabling accurate predictions of stationary covariance, iterate average covariance, and integrated autocorrelation time. The method provides improved tuning guidance for large-sample uncertainty quantification, especially under model misspecification.

Faster LLM Inference via Sequential Monte Carlo

arXiv cs.CL

This paper proposes Sequential Monte Carlo Speculative Decoding (SMC-SD), a method that accelerates LLM inference by replacing token-level rejection in speculative decoding with importance-weighted resampling over draft particles, achieving 2.36× speedup over standard speculative decoding and 5.2× over autoregressive decoding while maintaining 3% accuracy loss.

Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3

Hugging Face Daily Papers

This paper analyzes inference-time optimization techniques for AIMO 3, finding that model capability dominates over prompt engineering and diverse sampling strategies. The study reveals that high-temperature sampling already decorrelates errors maximally, leaving no room for prompt-based improvements, and identifies a 6-point selection loss gap between individual model pass@20 and majority voting consensus.

Evaluation-driven Scaling for Scientific Discovery

Hugging Face Daily Papers

SimpleTES framework scales evaluation-driven discovery loops across 21 scientific problems, yielding 2× speedups on LASSO, 24.5% quantum gate reductions, and new Erdos constructions while enabling trajectory-level model post-training.

Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation