Tag
A training-free framework for spatial reasoning from egocentric videos that enables revisiting conclusions through synthesized novel-view videos generated from predicted 3D geometry.
This paper introduces DyCon, a training-free framework that uses step-level embeddings to model evolving task difficulty and dynamically control reasoning depth in Large Reasoning Models, effectively reducing overthinking and improving efficiency without sacrificing accuracy.
Phase Marginalization is a post-hoc method that addresses phase-dependent instability in Vision Transformers by evaluating structured patch-grid phases and aggregating outputs. It improves segmentation, depth, and local matching over the canonical baseline with minimal extra cost.
ECI_sem is a training-free method for ranking hard negative sources in dense retrieval using frozen embeddings, achieving strong performance on MS MARCO and BEIR benchmarks.
This paper proposes a training-free, CPU-only retrieval method that fuses BM25 lexical scores with late-interaction dense scores for conversational memory retrieval, achieving up to +17.2 points improvement on LoCoMo Hit@1 over late interaction alone across six encoders. The study provides controlled ablations on pooling operators, reranker effects, and benchmark robustness, framing the gain as a division of labor between dense and lexical signals.
This paper proposes Dynamic Infilling Anchors (DIA), a training-free method for diffusion large language models that dynamically estimates end-anchor positions to enforce format constraints (e.g., parseable JSON, reasoning templates) while avoiding the rigidity of fixed-span approaches. Experiments show significant zero-shot gains on GSM8K and MATH benchmarks.
This paper proposes AXON, a training-free module that improves the quality-latency trade-off of discrete diffusion language model decoding by intelligently selecting 'anchor' tokens to reveal first, using attention, uncertainty, and confidence signals to support subsequent denoising steps. Experiments on reasoning and code-generation benchmarks show AXON reduces function evaluations while maintaining or improving accuracy.
RhymeFlow accelerates diffusion transformers for video generation by decoupling denoising trajectories across frames, using keyframe anchoring and latent trajectory projection to reduce computational overhead while maintaining visual quality.
PhaseLock is a training-free framework that preserves motion priors from early-step inference to improve physical consistency in image-to-video diffusion models, achieving 6.2 point improvement with minimal overhead.
Fast-dLLM++ introduces Fréchet profile decoding for diffusion LLMs, a training-free method that selects parallel commit sets based on heterogeneous confidence profiles, achieving up to 37% higher throughput at comparable accuracy on benchmarks with LLaDA-8B.
WaveFilter proposes a training-free, wavelet-guided KV cache filtering framework for diffusion large language models that enhances long-context capability by precisely identifying key tokens and constructing sparse caches, improving performance on complex long-context tasks.
This paper proposes a training-free method to automatically generate fine-grained evaluation rubrics for LLM-as-a-judge without human annotation, and further introduces an iterative fine-tuning strategy for a rubric generator that outperforms larger proprietary models.
PlatonicNav introduces a training-free framework for embodied navigation that uses vision-only semantic maps and blind matching to ground language goals, achieving generalization across tasks and embodiments without explicit cross-modal training.
Proposes Chunk-Level Guided Generation, a training-free method using off-the-shelf LLMs as process scorers to select fixed-length candidate chunks during small model generation, significantly improving mathematical reasoning accuracy over majority voting and PRM guided search.
SkillAdaptor is a training-free step-level skill adaptation framework with explicit failure attribution for LLM agents, improving performance on WebShop, PinchBench, and Claw-Eval.
Proposes SERC, a training-free method inspired by LDPC codes to correct hallucinations in LLMs by treating generation as a noisy channel and using sparse verification queries against external evidence.
LVSA introduces a training-free sparse attention mechanism for video diffusion models, reducing compute up to 3.17x while enabling generation beyond training horizons without quality loss.
Light Interaction introduces a training-free inference acceleration framework for interactive video world models, using adaptive context management, denoising cache acceleration, and 3D block sparse attention to achieve up to 2.59x speedup while maintaining competitive visual quality.
Group Prompting introduces a training-free framework for cell instance segmentation that requires only one click per cell type, using the Segment Anything Model's feature space to recursively expand prompts, achieving competitive performance without training.
EarlyTom is a training-free framework that compresses visual tokens early in the vision encoder to reduce time-to-first-token and computational costs while maintaining accuracy, achieving up to 2.65x TTFT reduction.