Tag
ProactiveLLM introduces a method for streaming LLMs to actively decide when to generate output based on endogenous cues, using mask-based streaming modeling and synchronized privileged self-distillation, reducing latency without external annotations.
This paper introduces AIPO, a reinforcement learning framework that enhances LLM reasoning by allowing the model to actively consult collaborative agents during exploration to overcome capability boundaries.