Attacks on Machine-Text Detectors Retain Stylistic Fingerprints
Summary
This paper investigates evasion attacks on machine-text detectors, finding that while current attacks degrade detector performance, stylistic fingerprints persist. A novel paraphrasing approach that mimics human styles can evade even style-based detectors, but multi-document analysis recovers detectability.
View Cached Full Text
Cached at: 06/16/26, 11:34 PM
Paper page - Attacks on Machine-Text Detectors Retain Stylistic Fingerprints
Source: https://huggingface.co/papers/2505.14608 Published on Jun 8
·
Submitted byhttps://huggingface.co/rrivera1849
Rafaelon Jun 16
Abstract
Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances.
Despite considerable progress in the development ofmachine-text detectors, the ease with which machine-text can be manipulated to evade detection has led to suggestions that the problem is inherently intractable. In this work, we investigate the limits of suchevasion strategies. We demonstrate that while current attacks, ranging fromprompt engineeringtodetector-guided optimizationcan effectively degrade performance of standard detectors, they fail to erase the underlying stylistic “fingerprints” of machine text. We show thatfew-shot detectorsthat utilize the stylistic feature space are robust to these evasion attempts, reliably detecting samples even from models explicitly tuned to prevent detection. This raises the question: does style represent a universal defense against machine-detection attacks? We demonstrate that the answer is “no’’ by introducing a novelparaphrasing approachthat simultaneously optimizes for undetectability and adherence to specific human styles. We show that unlike prior methods, this attack effectively evades all considered detectors, including those that utilize writing style. However, we find that this evasion is not absolute: as the number of documents available for analysis grows, the human and machine distributions become distinguishable again. Overall, our findings suggest that reliable machine-text detection requires moving beyond single-document analysis tomulti-document analysis.
View arXiv pageView PDFGitHub2Add to collection
Get this paper in your agent:
hf papers read 2505\.14608
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper1
#### rrivera1849/style-aware-paraphraser-mistral7b Text Generation• 7B• Updated3 days ago • 65
Datasets citing this paper2
#### rrivera1849/style-aware-paraphraser-author-bank-reddit Viewer• Updated1 day ago • 12k • 52 #### rrivera1849/style-aware-paraphraser-outputs Viewer• Updated4 days ago • 31k • 46
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2505.14608 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
This paper investigates the resilience of AI-generated text detection methods (fine-tuned RoBERTa, Binoculars, text feature analysis, and ensembles) against paraphrasing attacks, finding that Binoculars-inclusive ensembles are most effective but also most vulnerable to attacks, highlighting a dichotomy between performance and resilience.
Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement
This paper reveals the existence of hidden human-like spans in machine-generated texts and proposes a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of these spans.
Been watching real adversarial input hit my detection API for six months. Here's what's actually landing.
A six-month analysis of real adversarial inputs reveals that simple multi-turn setups, forward-momentum exploitation, and role redefinition attacks consistently bypass single-message classifiers. The post argues that stateful monitoring of conversational context is more effective than improving one-shot detection.
Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text
This paper addresses the degradation of likelihood-based machine-generated text detectors by identifying a Simpson's paradox in token-score aggregation. It proposes a learned local calibration step that significantly improves detection performance across various models and datasets.
Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation
Proposes LiSCP, a lightweight stylistic consistency profiling method for robust detection of LLM-generated textual content, focusing on feature stability under adversarial manipulation. Achieves superior performance on in-domain and cross-domain detection with notable robustness.