test-time-search

#test-time-search

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Reddit r/LocalLLaMA ↗ · 2026-05-22 Cached

This paper introduces Vector Policy Optimization (VPO), a reinforcement learning algorithm that trains LLMs to produce diverse solutions by optimizing across multiple reward dimensions, significantly improving test-time search performance compared to scalar RL baselines.

0 favorites 0 likes

#test-time-search

(1D) Ordered Tokens Enable Efficient Test-Time Search

Hugging Face Daily Papers ↗ · 2026-04-16 Cached

This paper investigates how 1D coarse-to-fine token structures in autoregressive models improve test-time search efficiency compared to classical 2D grid tokenization. The authors show that such ordered tokens enable better test-time scaling and even training-free text-to-image generation when guided by image-text verifiers.

0 favorites 0 likes

test-time-search

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

(1D) Ordered Tokens Enable Efficient Test-Time Search

Submit Feedback