geometric-blind-spot

Tag

Cards List
#geometric-blind-spot

10 years of AI robustness tricks (PGD, RLHF, Data Augmentation) are actually computing the same hidden matrix. We proved what happens when you get it wrong.

Reddit r/ArtificialInteligence · 2026-05-26

A research paper proves that various AI robustness techniques (PGD, RLHF, data augmentation) all estimate the same deployment nuisance covariance matrix. Applying a geometric penalty term reduces sycophancy in Qwen2.5-7B from 38.5% to 13.5% and improves adversarial robustness by 14.8% over standard PGD-AT.

0 favorites 0 likes
← Back to home

Submit Feedback