Scalable Inference-Time Annealing with Surrogate Likelihood Estimators
Summary
SITA (Scalable Inference-Time Annealing) introduces a method for efficiently sampling molecular Boltzmann distributions by retraining flow-based models along a temperature ladder using energy-based surrogate likelihoods, avoiding costly divergence computations. The approach achieves state-of-the-art performance on Alanine Dipeptide and Tripeptide benchmarks.
Similar Articles
Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo
This paper proposes new discrete-time approximations for stochastic gradient Langevin dynamics (SGLD) with and without momentum, enabling accurate predictions of stationary covariance, iterate average covariance, and integrated autocorrelation time. The method provides improved tuning guidance for large-sample uncertainty quantification, especially under model misspecification.
Faster LLM Inference via Sequential Monte Carlo
This paper proposes Sequential Monte Carlo Speculative Decoding (SMC-SD), a method that accelerates LLM inference by replacing token-level rejection in speculative decoding with importance-weighted resampling over draft particles, achieving 2.36× speedup over standard speculative decoding and 5.2× over autoregressive decoding while maintaining 3% accuracy loss.
Model Capability Dominates: Inference-Time Optimization Lessons from AIMO 3
This paper analyzes inference-time optimization techniques for AIMO 3, finding that model capability dominates over prompt engineering and diverse sampling strategies. The study reveals that high-temperature sampling already decorrelates errors maximally, leaving no room for prompt-based improvements, and identifies a 6-point selection loss gap between individual model pass@20 and majority voting consensus.
Evaluation-driven Scaling for Scientific Discovery
SimpleTES framework scales evaluation-driven discovery loops across 21 scientific problems, yielding 2× speedups on LASSO, 24.5% quantum gate reductions, and new Erdos constructions while enabling trajectory-level model post-training.
Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation
This paper introduces Temporal-Spatial Parallel Decoding (TSPD) and Confidence Extrapolation (CE) to accelerate inference in diffusion-based large language models by dynamically deciding when tokens have converged and forecasting logit trends, reducing unnecessary denoising steps while preserving output quality.