Tag
This paper introduces LoopCoder-v2, a family of 7B parameter parallel loop transformers for code generation, and studies the optimal number of loops, finding that two loops yield significant gains while more loops cause degradation.
This paper introduces Test-Time Personalization (TTP), a framework that improves LLM personalization by scaling inference-time computation through candidate sampling and reward-based selection. It diagnoses failure modes in standard reward models and proposes a probabilistic personalized reward model to mitigate them.
This paper introduces PaT (Planning-after-Trial), an adaptive test-time computation strategy for code generation that reduces inference costs by approximately 69% while maintaining performance comparable to larger models.
This paper introduces 'prefix consistency,' a method that weights candidate responses in Chain-of-Thought reasoning based on answer reproduction rates during trace regeneration. It achieves high accuracy with significantly fewer tokens than standard majority voting across various reasoning models and benchmarks.