Tag
QGF is an RL algorithm that improves policies at test time by using a value gradient to guide a pre-trained flow policy, avoiding training-time instability while maintaining competitive performance.
Proposes Demo2Reward, a test-time prompt optimization technique for VLM reward models using a few expert demonstrations, significantly reducing false positives and improving policy learning in robotics without additional model training.
This paper develops a PAC-Bayesian framework for test-time adaptation that uses MMD-balls as credal sets, providing formal generalization bounds and separating epistemic from aleatoric uncertainty under distribution shift.
Proposes a hierarchical variational policy framework for reward-guided diffusion, enabling high-quality sampling with reduced inference cost. Achieves strong quality-speed tradeoff on tasks like super-resolution.
SOLAR proposes a self-optimizing autonomous agent that leverages parameter-level meta-learning and multi-level reinforcement learning to enable lifelong adaptation of LLMs to non-stationary data streams, outperforming baselines on reasoning tasks.
Proposes Federated Nested Learning (FedNL), a framework that reformulates federated learning as a three-level nested optimization system, enabling collaborative training of self-referential memories for test-time adaptation to handle Non-IID data and long-tail distributions.
This paper proposes RMemSafe, a reliability-gated extension for continual test-time adaptation that attenuates source anchoring when the frozen source's predictive entropy becomes high, preventing blind anchoring under source collapse. The method achieves state-of-the-art error reduction on the CCC benchmark.
This paper introduces TacoMAS, a framework for test-time co-evolution of agent capabilities and communication topology in LLM-based multi-agent systems. It demonstrates that jointly adapting fast capability loops and slow topology loops improves performance and stability over existing baselines.
FAAST proposes a forward-only method that compiles labeled examples into fast weights analytically, enabling efficient test-time supervised adaptation without backpropagation, achieving over 90% speedup and 95% memory savings while maintaining performance.
This paper proposes CAP-TTA, a test-time adaptation framework that uses preconditioned LoRA updates triggered by bias-risk scores to mitigate toxicity and bias in large language models during narrative generation, achieving faster optimization and better fluency than standard baselines.
TTL introduces a test-time textual learning framework for OOD detection using pretrained vision-language models like CLIP, which dynamically learns OOD semantics from unlabeled test streams without external OOD labels. The method uses pseudo-labeled samples and an OOD knowledge purification strategy to improve detection robustness across diverse and evolving OOD distributions.