Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Hugging Face Daily Papers Papers

Summary

ACTS (Agentic Chain-of-Thought Steering) formulates LLM reasoning control as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference using reasoning strategies and steering phrases. The approach achieves comparable accuracy to full-thinking models with significant token savings, enabling controllable accuracy-efficiency trade-offs.

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Existing efficient reasoning methods control thinking length by shortening, early-stopping, or compressing traces, leaving how the model thinks implicit. In this paper, we propose Agentic Chain-of-Thought Steering (ACTS), which formulates reasoning steering as a Markov decision process where a controller agent adaptively steers a frozen reasoner during inference. At each step, the controller observes the reasoning trace and remaining thinking budget, then issues a steering action consisting of a reasoning strategy and a steering phrase that initiates the next reasoner step. This enables budget-aware strategy control for efficient reasoning while preserving the reasoner's generation continuity. We initialize the controller agent from our constructed synthetic steering trajectories with multi-budget augmentation, and further optimize it via reinforcement learning with budget-conditioned reward shaping. Experiments across multiple benchmarks show that ACTS matches full-thinking performance with substantial token savings, and enables controllable accuracy-efficiency trade-offs across different reasoners and tasks. The code is available at https://github.com/Andree-9/ACTS.
Original Article

Similar Articles

Adaptive Latent Agentic Reasoning

arXiv cs.CL

This paper introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework for LLM agents that uses compact latent reasoning for routine turns and selectively escalates to explicit chain-of-thought for harder decisions, achieving up to 84.6% token reduction while maintaining task accuracy.

Manifold-Guided Attention Steering

arXiv cs.LG

Proposes Manifold-Guided Attention Steering (MAGS), a trajectory-aware inference-time intervention that corrects reasoning errors in LLMs by projecting attention outputs back to a learned correctness manifold when deviation exceeds a threshold, outperforming static steering methods across math, code, and molecular benchmarks.

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

arXiv cs.AI

This paper investigates how chain-of-thought reasoning in large reasoning models complicates activation-based steering of refusal behavior. Experiments on DeepSeek-R1-Distill-LLaMA-8B show that refusal is jointly encoded in residual stream activations and the CoT trace, making models more robust to activation-level interventions but exposing the CoT as an alternative attack surface.