Tag
A research paper proves that various AI robustness techniques (PGD, RLHF, data augmentation) all estimate the same deployment nuisance covariance matrix. Applying a geometric penalty term reduces sycophancy in Qwen2.5-7B from 38.5% to 13.5% and improves adversarial robustness by 14.8% over standard PGD-AT.