Tag
A post-hoc method reduces spurious correlations in fine-tuned LLMs by truncating the tail of the SVD of the weight update matrix. It reduces the spurious-group gap by up to 5x with less than 2pp accuracy loss, without retraining or group labels.
SpurAudio is a new benchmark designed to evaluate shortcut learning and spurious correlations in few-shot audio classification, revealing that state-of-the-art methods—including large pretrained audio foundation models—suffer significant performance degradation when background correlations are disrupted.
The paper proposes a method to mitigate spurious correlations by disentangling learning dynamics of core and spurious features using a two-stage sample scoring function, achieving state-of-the-art debiasing performance with only 10% of training data.
This paper analyzes spurious correlation learning in preference optimization methods like DPO, identifying mechanisms such as mean spurious bias and causal-spurious leakage. It proposes 'tie training' using equal-utility preference pairs as a mitigation strategy to reduce reliance on spurious features without degrading causal learning.
This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.