inference-time

Tag

Cards List
#inference-time

ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection

arXiv cs.AI · 2026-06-18 Cached

Proposes ARIADNE, a training-free, adapter-agnostic routing framework that selects the optimal PEFT adapter at inference time by measuring input proximity to adapter-specific centroids in embedding space, recovering 97.44% of upper-bound performance on 23 tasks.

0 favorites 0 likes
#inference-time

From Consumption to Reflection: Designing Human-AI Relations for Stable Reasoning

arXiv cs.AI · 2026-06-11 Cached

This paper introduces Relational Reflective Intelligence (RRI), an inference-time governance layer that uses auditable reasoning loops to stabilize human-AI reasoning, addressing cognitive vulnerabilities shared by humans and LLMs.

0 favorites 0 likes
#inference-time

Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs

arXiv cs.LG · 2026-06-10 Cached

The paper introduces Probe-Conditioned Head Intervention (PCHI), an inference-time method for LLMs that selectively reduces overconfidence on wrong answers without significantly reducing confidence on correct ones, by conditionally rescaling attention head outputs when the model is likely wrong but confident.

0 favorites 0 likes
#inference-time

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Hugging Face Daily Papers · 2026-06-10 Cached

Evoflux uses evolutionary search at inference time to repair failed tool workflows for compact language models, boosting execution feasibility significantly over fine-tuning methods.

0 favorites 0 likes
#inference-time

I built an inference-time epistemic framework that extends coherent LLM threads to 325k–1M tokens. Here's how it works.

Reddit r/artificial · 2026-06-05

An independent researcher introduces Epistemic Lattice Tethering (ELT), an inference-time scaffolding framework that extends coherent LLM threads to 325k–1M tokens by applying epistemic and ontological governance.

0 favorites 0 likes
#inference-time

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

arXiv cs.AI · 2026-06-04 Cached

This paper demonstrates that LLM safety vulnerabilities extend beyond 'shallow safety' (first-token alignment) to any point during generation, showing that short token injections mid-sequence can redirect models toward harmful outputs. The authors propose training on generation trajectories with simulated mid-sequence perturbations to improve robustness.

0 favorites 0 likes
#inference-time

The Digital Apprentice: A Framework for Human-Directed Agentic AI Development

arXiv cs.AI · 2026-06-04 Cached

This paper presents the 'Digital Apprentice,' a framework for scalable and safe agentic AI in which autonomy is earned incrementally through observational learning, human authorization, and continuous alignment correction. It introduces ADAPT, an inference-time control plane that operationalizes graduated autonomy tiers and converts human corrections into reusable preference data.

0 favorites 0 likes
#inference-time

Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization

arXiv cs.CL · 2026-06-03 Cached

This paper proposes Dynamic Contextual Orthogonalization (DCO), an inference-time method that reduces hallucinations in large language models by aligning attention head outputs with the context manifold, achieving superior faithfulness on benchmarks with Llama-3 models.

0 favorites 0 likes
#inference-time

Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs

arXiv cs.AI · 2026-06-02 Cached

Introduces Latent Reward Steering (Lrs), an adaptive inference-time framework that uses sparse autoencoder latent states and a learned reward model to implicitly promote cognitive behaviors like verification and backtracking in reasoning LLMs, improving performance across multiple models and benchmarks.

0 favorites 0 likes
#inference-time

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

arXiv cs.AI · 2026-06-02 Cached

TIGER is an inference-time framework that mitigates hallucinations in multimodal generation by extracting observation and claim graphs and assigning risk scores to repair unsupported facts. It reduces unsupported content across image-to-text, image+text-to-text, audio-to-text, and video-to-text tasks.

0 favorites 0 likes
#inference-time

Scalable Inference-Time Annealing with Surrogate Likelihood Estimators

Hugging Face Daily Papers · 2026-06-01

SITA (Scalable Inference-Time Annealing) introduces a method for efficiently sampling molecular Boltzmann distributions by retraining flow-based models along a temperature ladder using energy-based surrogate likelihoods, avoiding costly divergence computations. The approach achieves state-of-the-art performance on Alanine Dipeptide and Tripeptide benchmarks.

0 favorites 0 likes
#inference-time

DenseSteer: Steering Small Language Models towards Dense Math Reasoning

arXiv cs.AI · 2026-05-29 Cached

DenseSteer is a training-free inference-time framework that improves small language models' math reasoning by steering their internal representations towards dense reasoning patterns, achieving accuracy gains without increasing token-level negative log-likelihood.

0 favorites 0 likes
#inference-time

SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability

arXiv cs.CL · 2026-05-27 Cached

The paper introduces SeDT, a training-free inference-time method that improves LLM reliability in multi-turn conversations by annotating conversation history with cumulative relevance scores from three signals, achieving up to +37.7% performance gains on the Lost-in-Conversation benchmark.

0 favorites 0 likes
#inference-time

Injecting Image Guidance into Text-Conditioned Diffusion Models at Inference

Hugging Face Daily Papers · 2026-05-24 Cached

Visual Concept Fusion (VCF) enables dual conditioning on both an image and text prompt in diffusion models at inference time without retraining, using a lightweight aligner and fusion strategy.

0 favorites 0 likes
#inference-time

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

arXiv cs.AI · 2026-05-20

This paper studies whether tabular foundation models based on pretrained prior-data fitted networks (PFNs) can generalize to strategic tabular data where individuals modify features after deployment. It proposes Strategic Prior-data Fitted Network (SPN), an inference-time framework that aligns PFN predictions with the post-manipulation distribution without retraining.

0 favorites 0 likes
#inference-time

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Hugging Face Daily Papers · 2026-05-20 Cached

A novel inference-time method for long video generation using overlapping sliding windows with Tweedie matching and stochastic early-phase sampling to improve temporal consistency and visual quality without additional training.

0 favorites 0 likes
#inference-time

Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models

arXiv cs.CL · 2026-05-19 Cached

This paper tests whether varying inference-time reasoning effort affects the alignment between large reasoning models' chain-of-thought lengths and human reaction times. Results show alignment is invariant to effort perturbations, suggesting it is a training-time achievement.

0 favorites 0 likes
#inference-time

Harnessing LLM Agents with Skill Programs

Hugging Face Daily Papers · 2026-05-18 Cached

HASP is a framework that upgrades agent skills into executable program functions acting as guardrails, enabling direct intervention in LLM agent loops and improving performance on complex tasks like web-search, math reasoning, and coding.

0 favorites 0 likes
#inference-time

Inference-Time Machine Unlearning via Gated Activation Redirection

arXiv cs.LG · 2026-05-14 Cached

This paper introduces GUARD-IT, a training-free method for machine unlearning that uses input-dependent activation steering at inference time to remove targeted knowledge from LLMs without modifying weights, matching or exceeding gradient-based baselines while preserving utility and robustness to quantization.

0 favorites 0 likes
#inference-time

SkillGen: Verified Inference-Time Agent Skill Synthesis

arXiv cs.LG · 2026-05-13 Cached

This article introduces SkillGen, a multi-agent framework that synthesizes and verifies reusable inference-time skills for LLM agents by contrasting successful and failed trajectories. The method ensures skills are auditable and empirically verified for their net positive impact on agent performance.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback