Tag
This paper introduces a non-parametric multi-view Gaussian process framework for detecting machine-generated text that is robust to adversarial manipulations like paraphrasing. By combining complementary features and providing calibrated uncertainty, it outperforms existing detectors on held-out attacks.
This paper investigates evasion attacks on machine-text detectors, finding that while current attacks degrade detector performance, stylistic fingerprints persist. A novel paraphrasing approach that mimics human styles can evade even style-based detectors, but multi-document analysis recovers detectability.