Tag
This paper proposes a training-free 'identify-before-answer' (IBA) framework for Knowledge-Based Visual Question Answering (KB-VQA) that decouples entity identification from evidence ranking, outperforming fine-tuned multi-modal retrieval-augmented generation baselines while reducing complexity.
VESFlow is a training-free safety method for flow matching-based text-to-image generation that edits velocity fields to ensure safe output while maintaining prompt integrity.
Presents a training-free method for multi-hop retrieval-augmented generation that avoids costly graph rebuilds when underlying data changes, tackling the staleness issue in dynamic environments.
This paper introduces Confident Decoding, a training-free decoding strategy that dynamically selects the most reliable intermediate layer in LLMs using entropy-guided search, mitigating the alignment tax and improving reasoning performance on benchmarks like GPQA-Diamond and Omni-MATH with negligible overhead.
This paper identifies document-side early compression as a failure mode in long-document dense retrieval and introduces the Evidence Dilution Index (EDI) to measure it. The authors propose DICE, a training-free method that splits documents into chunks, encodes them independently, and aggregates them into a single vector, significantly improving retrieval on long documents.
JanusMesh is a fast, training-free framework that generates text-driven 3D visual illusions—a single mesh revealing different semantics from different viewing angles—by decoupling generation into cross-space dual-branch denoising and view-conditioned texture synthesis, achieving high realism in just 3-5 minutes.
This paper identifies an anchor collapse phenomenon in agentic search where parallel trajectories converge due to similar initial queries, and proposes DivInit, a training-free method that samples diverse initial queries to improve multi-hop question answering performance.
Proposes the Bag of Dims framework showing that the standard basis of transformer hidden states provides a training-free, architecture-general feature representation where dimensions encode semantic content via sign patterns; validated across language, vision, and audio models, achieving high accuracy with no learned rotations.
ASAG uses attention entropy to detect when reasoning is unproductive, stopping early to improve accuracy and reduce token generation. Experiments on Qwen3-8B show a 4.4% accuracy gain and over 40% fewer generated tokens.
DiRecT introduces a training-free algorithm for safe diffusion-based planning that enforces constraints only on final clean trajectories using receding-horizon denoising, improving safety and performance over existing methods.
HiDRA is a training-free method that uses high-dimensional random projection for activation steering in LLMs, capturing discriminative signals beyond linear methods and consistently outperforming existing baselines across diverse model families and benchmarks.
Introduces Adelic operation-preserved embeddings (AOE), a training-free representation that encodes numbers by combining real value with p-adic expansions, preserving additive and multiplicative structure. Achieves perfect accuracy on the Weaving Pattern benchmark.
This paper proposes a falsifiable applicability criterion for a training-free, fixed-length descriptor for multivariate time series based on time-lagged spectral embeddings, showing when it can be expected to work and validating it on multiple benchmarks.
NVIDIA introduces SpatialClaw, a training-free spatial reasoning agent that uses a VLM to write Python code in a persistent kernel, compose perception tools, and revise plans, achieving +11.2 points over prior agents on 20 benchmarks.
SkillCAT is a training-free framework for LLM agent skill self-evolution that addresses limitations of single-trace bias, unverified merging, and full corpus loading via three stages: Contrastive Causal Extraction, Assessment-Augmented Evolution, and Topology-Aware Task Execution, achieving up to 40.40% improvement on benchmarks.
Introduces RKSC, a training-free inference framework for multi-branch LLM reasoning that reduces KV cache redundancy via similarity-based sharing and early exit, achieving up to 3x speedup with minimal error.
This paper introduces Entropy-Guided Power Sampling (EGPS), a training-free and verifier-free sampler that improves the efficiency of power sampling for enhancing base language model reasoning. EGPS achieves up to 12.6x speedup over standard Metropolis-Hastings sampling while reaching best or tied-best accuracy on benchmarks like MATH500, HumanEval, and GPQA.
This paper introduces MGAP, a training-free decoding method that reduces hallucinations in Multimodal Large Language Models by adaptively suppressing only the harmful parts of language priors while preserving the model's semantic manifold. The method outperforms prior baselines on POPE and CHAIR benchmarks.
Dep-LLM is a training-free framework that uses frozen large language models to diagnose depression from clinical interviews by decomposing dialogue into five clinically aligned themes with evidence-grounded reasoning and confidence modulation, outperforming zero-shot and some supervised methods on DAIC-WOZ and E-DAIC datasets.
This paper proposes Prefilling-dLLM, a training-free framework that partitions the prefix into chunks and caches KV representations, achieving state-of-the-art quality and up to 28x speedup for long-context inference in diffusion language models.