distribution-shift

Tag

Cards List
#distribution-shift

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

arXiv cs.AI · 17h ago Cached

This paper identifies distribution shift and scale constraints as critical failure modes for statistical contamination detection methods in LLM benchmark auditing. Evaluating three paradigms across 27 models reveals only 199 correct outcomes out of 335 evaluations, indicating a systematic reliability gap that prevents these methods from replacing transparent data provenance.

0 favorites 0 likes
#distribution-shift

Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift

arXiv cs.LG · 17h ago Cached

This paper introduces a theoretical framework for quantifying deployment risk when training and deployment distributions differ due to latent regime dynamics modeled as a Markov-switching process, providing exact decomposition and finite-sample bounds.

0 favorites 0 likes
#distribution-shift

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

arXiv cs.LG · 2d ago Cached

Introduces TASER, a training-time regularization framework derived from Langevin Stein operators that encourages geometric compatibility between predictors and data density, improving adversarial robustness and stability on CIFAR-10 without significant clean accuracy degradation.

0 favorites 0 likes
#distribution-shift

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

arXiv cs.AI · 2026-05-27 Cached

This paper theoretically identifies and mitigates context distribution shift in multi-turn dialogue RL, proposing Calibrated Interactive RL that couples interactive RL with simulator alignment to reduce the sim-to-real gap and achieve state-of-the-art performance.

0 favorites 0 likes
#distribution-shift

MARGIN: Runtime Confidence Calibration for Multi-Agent Foundation Model Coordination

arXiv cs.LG · 2026-05-25 Cached

MARGIN is a runtime confidence calibration method for multi-agent foundation model systems that learns per-agent calibration factors online, improving pairwise resolution from below random to 70-89% on hard benchmarks, requiring no held-out data or retraining.

0 favorites 0 likes
#distribution-shift

MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation

arXiv cs.LG · 2026-05-22 Cached

This paper develops a PAC-Bayesian framework for test-time adaptation that uses MMD-balls as credal sets, providing formal generalization bounds and separating epistemic from aleatoric uncertainty under distribution shift.

0 favorites 0 likes
#distribution-shift

PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift

arXiv cs.LG · 2026-05-19 Cached

This paper proposes Physics-Informed Multi-Scale Mamba (PIMSM), a state-space architecture that aligns model memory with physical timescales to improve robustness under distribution shift in scientific time series, demonstrating improvements on fMRI and weather forecasting tasks.

0 favorites 0 likes
#distribution-shift

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

arXiv cs.AI · 2026-05-18 Cached

This paper introduces ICRL, a framework that jointly trains a solver and critic with reinforcement learning to internalize critique guidance, enabling the solver to improve without external critique. It uses distribution calibration and role-wise group advantage estimation, achieving 6-7 point gains over GRPO on agentic and mathematical reasoning tasks.

0 favorites 0 likes
#distribution-shift

When Informal Text Breaks NLI: Tokenization Failure, Distribution Shift, and Targeted Mitigations

arXiv cs.CL · 2026-04-21 Cached

This paper investigates how informal text (slang, emoji, Gen-Z filler tokens) degrades NLI accuracy in ELECTRA-small and RoBERTa-large models, identifying two distinct failure mechanisms—tokenization failure (emoji mapped to [UNK]) and distribution shift (out-of-domain noise tokens)—and proposes targeted mitigations that recover accuracy without harming clean-text performance.

0 favorites 0 likes
#distribution-shift

Beyond Surface Statistics: Robust Conformal Prediction for LLMs via Internal Representations

arXiv cs.CL · 2026-04-20 Cached

This paper proposes a conformal prediction framework for LLMs that leverages internal representations rather than output-level statistics, introducing Layer-Wise Information (LI) scores as nonconformity measures to improve validity-efficiency trade-offs under distribution shift. The method demonstrates stronger robustness to calibration-deployment mismatch compared to text-level baselines across QA benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback