Tag
This paper introduces directional sharpness, a new metric for certifying the generalization performance of machine learning models that is both efficient to compute and more reliable than existing proxies like test accuracy or traditional sharpness, even when training deviates from prescribed procedures.
This paper evaluates the robustness of tabular foundation models to biologically inspired distribution shifts in microbiome data, finding that protecting discriminative features is insufficient and zero-imputation is the most harmful perturbation.
Introduces PHANTOM, a large-scale open-source dataset of pre-generated adversarial attacks for vision-language models, covering 1010 high-level categories and 55 subcategories of harmful intents with 47,524 adversarial samples. The dataset aims to lower the barrier for adversarial research and enable systematic evaluation of VLM robustness and safety.
This paper identifies and analyzes the 'narration gap' in LLM-solver loops, where the soundness of formal solver outputs is compromised when LLMs narrate the result to users. Empirical studies show prompt injection can invert verified conclusions, and mitigation remains incomplete under adaptive attacks.
This paper introduces RPCL, a training-only framework for robust pair confidence learning in multimodal emotion-cause pair extraction, which improves discriminative separation of gold pairs from hard negatives and achieves significant gains in Pair F1 and AUPRC on three datasets.
This paper investigates LLM-based metrics for evaluating clinical significance in radiology report generation. It identifies discrimination bias in existing LLM evaluators and proposes training lightweight interpretable metrics to improve the balance between error detection and tolerance of harmless variations.
This paper introduces TS-Fault, a benchmark for evaluating time series forecasting models under structured fault scenarios like broken dependencies and regime changes, finding that clean-data accuracy often anti-correlates with robustness and that foundation models are especially fragile.
Veriphi is a GPU-accelerated neural network verification system that combines adversarial attacks with formal certification. It demonstrates that the effectiveness of training methods (standard, adversarial, certified) depends heavily on dataset complexity, with IBP dominating on simple MNIST and PGD on complex CIFAR-10, and achieves 5x verification speedup.
Proposes a prompt perturbation framework that generates perturbed prompt variants, filters out structurally inconsistent comparison patterns using graph-level consistency checks, then applies standard ranking methods to yield more reliable LLM rankings.
The paper identifies 'temporal credit dilution' in learned dynamics models where global readouts focus on spurious correlates rather than brief physical events. It proposes CREST, a training-free method that re-anchors pooled representations using event core estimates, improving out-of-distribution robustness.
RepSelect introduces a method for robust LLM unlearning that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50× better robustness against relearning attacks compared to existing baselines across multiple model families.
The article criticizes attempts to reverse-engineer Fable 5 by copying surface behaviors, instead introducing Hephaestus Stormbreaker—a robustness control layer for coding agents that enforces scope locking, evidence loops, regression tests, and gate checks to prevent agent drift and early quitting.
Proposes CoCoGEC, a counterfactual generation framework that alters error-irrelevant contexts in GEC training data to improve model robustness, achieving significant F0.5 gains on perturbed benchmarks.
This paper evaluates the robustness of proof autoformalization models in Lean 4 under global and local perturbations, finding that current LLM-based models are sensitive to perturbations and often fail to faithfully reflect local changes.
GRAPE is a training framework that progressively exposes parameter space during adversarial training, achieving higher robust accuracy with fewer parameters compared to fixed-structure methods on CIFAR-10.
Introduces ChLogic, an English-Chinese aligned benchmark that tests whether large language models preserve logical reasoning performance across languages, revealing persistent gaps influenced by surface realization and translation artifacts.
This paper reanalyzes a prior study claiming lower AI literacy predicts greater AI receptivity, finding that the aggregate negative relationship masks heterogeneity: the effect is not significant for text AI tools but remains strong for non-text AI tools, indicating a narrower pattern of broader adoption rather than general receptivity.
This paper investigates whether large language models have stable preferences across different deployment contexts, finding that context can cause larger variations than prompt perturbations, suggesting that measured preferences are context-conditioned rather than fixed properties.
BadWorld is a label-free adversarial framework that reveals structural vulnerabilities in visual world models by generating imperceptible perturbations that cause catastrophic failures in future rollouts.
RL4IL introduces a reinforcement learning-guided retrieval method that uses soft fusion over frozen demonstration libraries to handle missing sensor modalities in robotic imitation learning at inference time, achieving high success rates under complete camera dropout.