Tag
This paper demonstrates that fine-tuned AI text detectors amplify a pretrained typicality axis rather than learning an AI-vs-human boundary, with raw encoder projections often matching or exceeding fine-tuned performance.
This paper investigates the resilience of AI-generated text detection methods (fine-tuned RoBERTa, Binoculars, text feature analysis, and ensembles) against paraphrasing attacks, finding that Binoculars-inclusive ensembles are most effective but also most vulnerable to attacks, highlighting a dichotomy between performance and resilience.
This paper introduces MELD, a detector for AI-generated text that uses multi-task learning with auxiliary heads for generator family, attack type, and source domain to improve robustness. MELD achieves strong performance on the RAID benchmark and maintains low false-positive rates under adversarial attacks.