test-time-scaling

#test-time-scaling

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

arXiv cs.AI ↗ · yesterday Cached

This paper introduces BitCal-TTS, a runtime controller that improves accuracy and reduces premature halting in quantized reasoning models by calibrating confidence signals during test-time scaling.

0 favorites 0 likes

#test-time-scaling

Stream-T1: Test-Time Scaling for Streaming Video Generation

Hugging Face Daily Papers ↗ · 3d ago Cached

Stream-T1 is a proposed framework for test-time scaling in streaming video generation, improving temporal consistency and quality through mechanisms like noise propagation and reward pruning. The paper addresses the high computational costs of existing diffusion-based methods by leveraging chunk-level synthesis.

0 favorites 0 likes

#test-time-scaling

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

arXiv cs.CL ↗ · 2026-04-20 Cached

FS-Researcher introduces a file-system-based dual-agent framework that enables LLM agents to conduct deep research beyond context window limits by using persistent external memory as a shared workspace. The framework achieves state-of-the-art results on research benchmarks and demonstrates effective test-time scaling through computation allocation to evidence collection.

0 favorites 0 likes

#test-time-scaling

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

arXiv cs.CL ↗ · 2026-04-20 Cached

AgentV-RL introduces an Agentic Verifier framework that enhances reward modeling through bidirectional verification with forward and backward agents augmented with tools, achieving 25.2% improvement over state-of-the-art ORMs. The approach addresses error propagation and grounding issues in verifiers for complex reasoning tasks through multi-turn deliberative processes combined with reinforcement learning.

0 favorites 0 likes

#test-time-scaling

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

Hugging Face Daily Papers ↗ · 2026-04-19 Cached

Academic study shows LLM agents frequently discover complete solutions in their environments but almost never use them, revealing a missing "environmental curiosity" capability critical for open-ended tasks.

0 favorites 0 likes

#test-time-scaling

Scaling Test-Time Compute for Agentic Coding

Hugging Face Daily Papers ↗ · 2026-04-16 Cached

A test-time scaling framework for agentic coding that compresses rollout trajectories into structured summaries and uses recursive voting/PDR to boost Claude-4.5-Opus to 77.6% on SWE-Bench Verified.

0 favorites 0 likes

test-time-scaling

BitCal-TTS: Bit-Calibrated Test-Time Scaling for Quantized Reasoning Models

Stream-T1: Test-Time Scaling for Streaming Video Generation

FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents

AgentV-RL: Scaling Reward Modeling with Agentic Verifier

Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity

Scaling Test-Time Compute for Agentic Coding

Submit Feedback