geometric-blind-spot

#geometric-blind-spot

10 years of AI robustness tricks (PGD, RLHF, Data Augmentation) are actually computing the same hidden matrix. We proved what happens when you get it wrong.

Reddit r/ArtificialInteligence ↗ · 2026-05-26

A research paper proves that various AI robustness techniques (PGD, RLHF, data augmentation) all estimate the same deployment nuisance covariance matrix. Applying a geometric penalty term reduces sycophancy in Qwen2.5-7B from 38.5% to 13.5% and improves adversarial robustness by 14.8% over standard PGD-AT.

0 favorites 0 likes

geometric-blind-spot

10 years of AI robustness tricks (PGD, RLHF, Data Augmentation) are actually computing the same hidden matrix. We proved what happens when you get it wrong.

Submit Feedback