Non-Parametric Machine Text Detection via Multi-View Gaussian Processes
Summary
This paper introduces a non-parametric multi-view Gaussian process framework for detecting machine-generated text that is robust to adversarial manipulations like paraphrasing. By combining complementary features and providing calibrated uncertainty, it outperforms existing detectors on held-out attacks.
View Cached Full Text
Cached at: 06/15/26, 09:09 AM
# Non-Parametric Machine Text Detection via Multi-View Gaussian Processes
Source: [https://arxiv.org/html/2606.14060](https://arxiv.org/html/2606.14060)
Aleem Khan, Nicholas Andrews Department of Computer Science Johns Hopkins University \{aleem,noa\}@cs\.jhu\.edu
###### Abstract
Adversarial conditions such as paraphrasing and targeted style transfer sharply degrade the accuracy of machine text detectors\. A document, however, carries multiple complementary signals \(e\.g\., stylistic features, likelihood and rank\-order features, and structural features\), and an attack that suppresses one may leave others intact\. While a parametric classifier can learn to combine these features given sufficient supervision, classifiers are prone to making confidently incorrect predictions when the distribution shifts \(e\.g\., novel attacks or unseen language models\)\. To address this, we propose a multi\-view, non\-parametric detection framework that extracts complementary feature views from the same document and aggregates per\-view evidence through a Gaussian process ensemble\. By aggregating evidence across views, an adversary must simultaneously defeat multiple independent axes of detection, substantially raising the cost of evasion\. The Gaussian process formulation additionally provides calibrated probabilities and principled abstention on out\-of\-distribution inputs, supporting reliable deployment in high\-stakes settings\. We evaluate on three benchmarks spanning diverse generators and attacks: the DetectRL and RAID benchmarks, and the PAN 2025 shared task and demonstrate that our multi\-view detector maintains strong performance under the considered attacks, outperforming existing approaches against held out attacks\.
Non\-Parametric Machine Text Detection via Multi\-View Gaussian Processes
Aleem Khan, Nicholas AndrewsDepartment of Computer ScienceJohns Hopkins University\{aleem,noa\}@cs\.jhu\.edu
## 1Introduction
As language models \(LMs\) have become more capable and widely available to users, text generated by LMs has become more ubiquitous and indistinguishable from human writing\(Comanici and others,[2025](https://arxiv.org/html/2606.14060#bib.bib1); Grattafiori and others,[2024](https://arxiv.org/html/2606.14060#bib.bib3); OpenAI and others,[2024](https://arxiv.org/html/2606.14060#bib.bib2)\)\. While LMs serve many positive use cases, and have become closely intertwined with workflows, detection of machine\-generated content, especially in high\-stakes domains, is increasingly of interest to many communitiesIppolitoet al\.\([2020](https://arxiv.org/html/2606.14060#bib.bib15)\); Gehring and Paaßen \([2025](https://arxiv.org/html/2606.14060#bib.bib40)\)\. As generators have improved, detectors have as well: a growing body of work on machine\-generated text detection, yielding zero\-shot statistical tests\(Baoet al\.,[2024](https://arxiv.org/html/2606.14060#bib.bib4); Gehrmannet al\.,[2019](https://arxiv.org/html/2606.14060#bib.bib9); Hanset al\.,[2024](https://arxiv.org/html/2606.14060#bib.bib5)\), trained classifiersLiet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib6)\); Leeet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib16)\); Tianet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib17)\); Huet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib18)\), and commercial detection servicesEmi and Spero \([2024](https://arxiv.org/html/2606.14060#bib.bib19)\)\. Under controlled conditions, where machine text is generated without modification or adversarial intent, these detectors achieve high accuracyHanset al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib5)\)\. However, real\-world deployment introduces a fundamentally harder problem: adversarial conditions in which machine text is*edited, rewritten, or obfuscated*, with a human or another LM, before it reaches the detectorThaiet al\.\([2026](https://arxiv.org/html/2606.14060#bib.bib20)\)\.
Adversarial manipulation comes in many forms\. In relatively simpler cases, a user may attempt to alter machine written documents with a prompt\(Patelet al\.,[2024](https://arxiv.org/html/2606.14060#bib.bib10)\)\. On the other end of the spectrum, a more sophisticated adversary may fine\-tune generators to target specific types of detector directly\(Nickset al\.,[2024](https://arxiv.org/html/2606.14060#bib.bib13)\), or through proxies\(Wanget al\.,[2025](https://arxiv.org/html/2606.14060#bib.bib11); Sotoet al\.,[2025](https://arxiv.org/html/2606.14060#bib.bib12)\)\. Alternatively, machine text may be passed through a trained paraphraser to disrupt rank order scores of tokensKrishnaet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib7)\), or a pipeline of multiple paraphrasers, which can amplify this degradation\. These attacks exploit different vulnerabilities, but most existing detectors rely on a single feature space, such as token\-level probability under a reference language modelGehrmannet al\.\([2019](https://arxiv.org/html/2606.14060#bib.bib9)\), or stylistic fingerprintsSotoet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib14)\), and a targeted edit along that axis is sufficient to evade detection\. To address the multi\-objective nature of detection, we propose a multi\-view, non\-parametric framework that exploits this insight\.
We begin by building a few\-shot support for a target domain\. Our method relies on having a small number of human and machine exemplars from a domain of interest111Crucially, the attacks themselves are held out in our evaluations\.\. For each ofKKviews \(§[3\.1](https://arxiv.org/html/2606.14060#S3.SS1)\), we fit independent Gaussian process classifiers, yielding probabilities that naturally incorporate the GP’s predictive uncertainty \(§[3\.3](https://arxiv.org/html/2606.14060#S3.SS3)\)\. These probabilities are aggregated via a secondary linear model which produces a final calibrated uncertainty \(§[3\.4](https://arxiv.org/html/2606.14060#S3.SS4)\)\.
Our contributions are as follows:\(1\)A multi\-view detection framework that aggregates complementary views for robust detection under human editing and paraphrase attacks\.\(2\)A Gaussian process ensemble that delivers calibrated uncertainty, and thorough analysis demonstrating the approach’s robustness to various attacks\.\(3\)Evaluation on diverse benchmarks: DetectRL and RAID benchmarks, and PAN 2025 shared task datasets, demonstrating strong performance under adversarial conditions where single\-view detectors fail\.
## 2Preliminaries
### 2\.1Detection Under Adversarial Conditions Remains Difficult
Detection of AI\-generated and AI\-manipulated content has received significant attention from the research community as generators have become more capable and accessible\. Zero\-shot detection methods score documents using a reference model, using the idea that machine\-generated text will generally be more likely under any language modelMitchellet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib34)\); Baoet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib4)\); Hanset al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib5)\); Suet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib8)\); Yanget al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib35)\)\. Parametric methods have also demonstrated strong performance, but struggle to adapt to new distributionsSolaimanet al\.\([2019](https://arxiv.org/html/2606.14060#bib.bib38)\); Liet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib6)\); Huet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib18)\)\. Watermarking has emerged as another effective approach for detection, but it assumes access to the model during inferenceKirchenbaueret al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib43)\)\.
Recent work has also demonstrated that many detection approaches have significant vulnerabilities to a range of attacks and adversarial conditions and we replicate these findingsSotoet al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib12)\); Nickset al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib13)\); Krishnaet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib7)\)\.Nickset al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib13)\)in particular highlights a key risk of new detection methods, namely that they become targets to optimize against\.Sadasivanet al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib42)\)demonstrated that iteratively applying paraphrasing attacks significantly degrades performance\. Recently released datasets have pivoted away from evaluating on strictly machine\-generated text, and consider cases where humans and LMs may edit and manipulate each others writingHeet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib44)\); Artemovaet al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib37)\); Duganet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib23)\); Wuet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib22)\)\. Prior work has demonstrated that Gaussian Processes trained on feature extractors are effective classifiers for synthetic speech detectionGlazeret al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib26)\)\. Most closeley related to our work, Ghostbuster combines several LM derived features for detection, applies a search procedure, and applies a logistic regression to arrive at a final answerVermaet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib36)\)\. Our approach differs in that combine distinct feature spaces, and use Gaussian processes which yield calibrated uncertainties with orders of magnitude less data\.
### 2\.2Problem Statement
Figure 1:Overview of the proposed approach\. Each inputxxis represented via three complementary views and transformed into a probability via view\-specific GPs\. The three scalars are then projected into a single probability via logistic regression\.Let𝒳\\mathcal\{X\}denote the space of natural\-language documents\. We define our*detector*as a functionh:𝒳→\{0,1\}h:\\mathcal\{X\}\\to\\\{0,1\\\}that maps a document to a binary label, wherey=0y=0indicates human\-written text andy=1y=1indicates machine\-generated or machine\-manipulated text\.
#### Zero\-shot detection\.
Most existing detectors operate in a*zero\-shot*setting: the detector is applied directly to a test document with no domain\-specific training data\. Zero\-shot methods, such as log\-rank tests, likelihood curvature estimates, and cross\-perplexity ratios, can easily be run on any generator’s output, in any domain, under any attack, with no adaptation\. However, this generality comes at a cost, with performance degrading substantially when the test distribution involves adversarial manipulation\.
#### Our setting: few\-shot, domain\-anchored detection\.
We adopt a different operating assumption\. We assume our system has access to a small*support set*of in\-domain documents:
𝒮=\{\(xi,yi\)\}i=1N,yi∈\{0,1\},N=NH\+NM\.\\begin\{gathered\}\\mathcal\{S\}=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\},\\qquad y\_\{i\}\\in\\\{0,1\\\},\\\\ N=N\_\{H\}\+N\_\{M\}\.\\end\{gathered\}\(1\)
whereNHN\_\{H\}human\-written documents andNMN\_\{M\}machine\-generated \(and manipulated\) documents are drawn from the domain of interest\. In our main experiments we useNH=NM=32N\_\{H\}=N\_\{M\}=32documents per class to represent a quantity that is realistic to obtain in most practical scenarios \(e\.g\., a set of known human written essays, verified news articles, or authentic forum posts\)\. We also consider system performance asNNincreases to 256 \([Figure 2](https://arxiv.org/html/2606.14060#S4.F2)\)\.
Crucially, we impose no requirement that the support set reflect the*attack*or the*generator*encountered at test time\. This generally aligns with many real\-world scenarios, where anticipating all possible generators or attacks isn’t feasible\. The human examples must come from the target domain, but the machine examples may be generated by any available model, even one different from the adversary’s generator\. In our evaluations, we explicitly hold out both the attack type and the source generator from training, evaluating in a cross\-attack and cross\-generator transfer setting\. That is, we train on a collection of attacks and generators*except*one, and evaluate on that setting\. Our key finding is that the GP\-based multi\-view ensemble is able to leverage this small, potentially mismatched support to generalize robustly to held\-out generators and unseen attacks \(§[4](https://arxiv.org/html/2606.14060#S4.T4)\)\.
#### Why few\-shot?
This formulation takes a middle ground between the zero\-shot paradigm \(no domain data, broad but fragile coverage\) and fully supervised approaches \(large labeled corpora, strong but narrow\)\. By anchoring the detector to a small sample from the deployment domain, we provide the model with enough distributional context to learn meaningful decision boundaries, particularly in the centroid based feature space \(§[3\.2](https://arxiv.org/html/2606.14060#S3.SS2)\), while keeping the data requirement low enough for practical adoption\.
## 3Method
In order to develop a robust detector, we hypothesize that learning to combine the outputs of different models fit on complementary feature spaces will have several advantages over a joint model that learns arbitrary combinations of features across views\. First of all, by using independent view\-specific classifiers, an adversary must defeat all views simultaneously to fully evade detection\. Second, since we must learn detectors on the basis of small samples of confirmed real and fake data, a model that enabled arbitrary feature combinations would be prone to overfitting those features and therefore generalize poorly\.
To this end, we present a multi\-view, non\-parametric framework for machine\-generated text detection\. Given a documentxx, the system \(i\) extracts features fromKKindependent views\{φk\}k=1K\\\{\\varphi\_\{k\}\\\}\_\{k=1\}^\{K\}, \(ii\) projects each view into a low\-dimensional*distance feature*space, \(iii\) fits an independent variational Gaussian process \(GP\) classifier per view, \(iv\) obtains per\-view Bernoulli probabilitiespkp\_\{k\}via the probit link, \(v\) aggregates these probabilities, and \(vi\) calibrates a decision threshold with finite\-sample false\-positive guarantees\. An out\-of\-distribution \(OOD\) gate enables principled abstention when the model encounters text outside its training support\.
### 3\.1Multi\-View Feature Extraction
Our approach to selecting views is simple: force an attacker to solve a multi\-objective problem due to the different feature spaces of the views\. As illustrated in While we use a simple collection of three views, we find that adding additional views helps\. Each viewφk:𝒳→ℝDk\\varphi\_\{k\}:\\mathcal\{X\}\\to\\mathbb\{R\}^\{D\_\{k\}\}maps a raw text document to a feature vector that reflects a distinct axis of variation between human and machine writing\. We useK=3K=3views:
#### Style view \(Dk=4D\_\{k\}=4\)\.
Dense stylometric embeddings produced by a style representation model222[https://huggingface\.co/rrivera1849/LUAR\-CRUD](https://huggingface.co/rrivera1849/LUAR-CRUD), which is trained to encode writing style invariant to topicRivera\-Sotoet al\.\([2021](https://arxiv.org/html/2606.14060#bib.bib27)\)\. These embeddings capture lexical and syntactic fingerprints that are largely orthogonal to semantic content\. To avoid challenges associated with high\-dimensional features, we fit centroids based on human and machine data, and compute distances from each centroid for each test point\.
#### Probabilistic view \(Dk=2D\_\{k\}=2\)\.
A vector of zero\-shot detector scores: \(1\) the LogRank score, which measures the average token rank under a reference language model, and \(2\) the FastDetectGPT score, which estimates the log\-likelihood curvature around the document\. Both scores are computed using Falcon\-7BAlmazroueiet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib31)\)\.
#### Structural view \(Dk=8D\_\{k\}=8\)\.
A vector of hand\-crafted document statistics which capture surface\-level regularities in document organization that persist through many paraphrase operations, such as total token count and sentence count\. Our exact implementation can be found in[subsection A\.3](https://arxiv.org/html/2606.14060#A1.SS3)\.
The key motivation for multi\-view combination is adversarial robustness\. A paraphrase attack that normalizes token\-level probability features \(the probabilistic view\) leaves structural regularities and stylistic fingerprints largely intact\. Requiring converging evidence across views therefore raises the cost of any single\-axis attack\.
### 3\.2Centroid\-Based Feature Construction
For high\-dimensional embeddings \(e\.g\.,D=512D\{=\}512\), Euclidean distance concentration causes the GP’s kernel to lose discriminative power: pairwise distances converge to a common value, and the GP posterior degenerates toward the prior\. A simple solution is to project such views to a low\-dimensional space\. In more detail, given a labeled support set𝒮=\{\(xi,yi\)\}i=1N\\mathcal\{S\}=\\\{\(x\_\{i\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\}\(whereyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}denotes human/machine\), for each viewkkwe first compute class centroids:
𝝁H,k\\displaystyle\\boldsymbol\{\\mu\}\_\{H,k\}=1\|𝒮H\|∑i:yi=0φk\(xi\)\\displaystyle=\\frac\{1\}\{\|\\mathcal\{S\}\_\{H\}\|\}\\sum\_\{i:y\_\{i\}=0\}\\varphi\_\{k\}\(x\_\{i\}\)𝝁M,k\\displaystyle\\boldsymbol\{\\mu\}\_\{M,k\}=1\|𝒮M\|∑i:yi=1φk\(xi\)\\displaystyle=\\frac\{1\}\{\|\\mathcal\{S\}\_\{M\}\|\}\\sum\_\{i:y\_\{i\}=1\}\\varphi\_\{k\}\(x\_\{i\}\)where𝒮H\\mathcal\{S\}\_\{H\}and𝒮M\\mathcal\{S\}\_\{M\}denote the human and machine subsets of the support set, respectively\. Each document is mapped to a 4D feature vector combining Euclidean and diagonal Mahalanobis distances to both class centroids:
ψk\(x\)=\[dH,k,dM,k,mH,k,mM,k\]∈ℝ4,\\psi\_\{k\}\(x\)=\\bigl\[\\,d\_\{H,k\},\\;d\_\{M,k\},\\;m\_\{H,k\},\\;m\_\{M,k\}\\,\\bigr\]\\in\\mathbb\{R\}^\{4\},wheredH,k=‖φk\(x\)−𝝁H,k‖2d\_\{H,k\}=\\\|\\varphi\_\{k\}\(x\)\-\\boldsymbol\{\\mu\}\_\{H,k\}\\\|\_\{2\}anddM,k=‖φk\(x\)−𝝁M,k‖2d\_\{M,k\}=\\\|\\varphi\_\{k\}\(x\)\-\\boldsymbol\{\\mu\}\_\{M,k\}\\\|\_\{2\}are Euclidean distances, andmH,k=‖\(φk\(x\)−𝝁H,k\)/𝝈H,k‖2m\_\{H,k\}=\\\|\(\\varphi\_\{k\}\(x\)\-\\boldsymbol\{\\mu\}\_\{H,k\}\)/\\boldsymbol\{\\sigma\}\_\{H,k\}\\\|\_\{2\}andmM,k=‖\(φk\(x\)−𝝁M,k\)/𝝈M,k‖2m\_\{M,k\}=\\\|\(\\varphi\_\{k\}\(x\)\-\\boldsymbol\{\\mu\}\_\{M,k\}\)/\\boldsymbol\{\\sigma\}\_\{M,k\}\\\|\_\{2\}are diagonal Mahalanobis distances, with𝝈H,k\\boldsymbol\{\\sigma\}\_\{H,k\}and𝝈M,k\\boldsymbol\{\\sigma\}\_\{M,k\}denoting the per\-dimension standard deviations of the human and machine support features, respectively\. The Mahalanobis components account for per\-dimension variance, making the representation sensitive to distributional shape rather than scale alone\. Finally, the distance feature vector is standardized \(median/MAD\); the exact form is given in Appendix[A](https://arxiv.org/html/2606.14060#A1)\. We denote the standardized featuresψ~k\(x\)\\tilde\{\\psi\}\_\{k\}\(x\)\.
### 3\.3Per\-View Gaussian Process Classifiers
A separate Gaussian process binary classifier is fit for each viewkkon its \(optionally distance\-reduced\) featuresψ~k\\tilde\{\\psi\}\_\{k\}\. We use a sparse variational GP with a Bernoulli likelihood and a probit linkHensmanet al\.\([2015](https://arxiv.org/html/2606.14060#bib.bib33)\), a Matérn\-5/25/2kernel and inducing locations fixed to allNNtraining points333In our regimeNNranges from88to128128per class, so this incurs no scaling concerns\.; the variational distribution and kernel parameters are jointly trained by maximizing the evidence lower boundTitsias \([2009](https://arxiv.org/html/2606.14060#bib.bib41)\)\. Full training hyperparameters are listed in Appendix[A](https://arxiv.org/html/2606.14060#A1)\.
At a test pointx∗x^\{\*\}, the GP posterior over the latent function gives a predictive meanμk\(x∗\)=𝔼\[fk\(ψ~k\(x∗\)\)\]\\mu\_\{k\}\(x^\{\*\}\)=\\mathbb\{E\}\[f\_\{k\}\(\\tilde\{\\psi\}\_\{k\}\(x^\{\*\}\)\)\]and varianceσk2\(x∗\)=Var\[fk\(ψ~k\(x∗\)\)\]\\sigma^\{2\}\_\{k\}\(x^\{\*\}\)=\\operatorname\{Var\}\[f\_\{k\}\(\\tilde\{\\psi\}\_\{k\}\(x^\{\*\}\)\)\]\. The Bernoulli probability of machine generation for viewkkis obtained via the probit link:
pk\(x∗\)=Φ\(μk\(x∗\)1\+σk2\(x∗\)\),p\_\{k\}\(x^\{\*\}\)=\\Phi\\\!\\left\(\\frac\{\\mu\_\{k\}\(x^\{\*\}\)\}\{\\sqrt\{1\+\\sigma^\{2\}\_\{k\}\(x^\{\*\}\)\}\}\\right\),whereΦ\\Phiis the standard normal CDF\. This probability naturally incorporates the GP’s predictive uncertainty: whenσk2\\sigma^\{2\}\_\{k\}is large, the argument toΦ\\Phiis shrunk toward zero, pushingpkp\_\{k\}toward0\.50\.5and reflecting the model’s lack of confidence\. The per\-view probabilitiespkp\_\{k\}are passed directly to the aggregation stage described next\.
### 3\.4Calibration
A simple max or mean over the per\-view probabilities\{pk\(x∗\)\}k=1K\\\{p\_\{k\}\(x^\{\*\}\)\\\}\_\{k=1\}^\{K\}ignores systematic differences in calibration across views and is sensitive to a single poorly calibrated GP\. We instead learn an aggregator in a data\-driven manner using a small calibration set\.
#### Calibration set\.
Of the2N2Nlabeled documents available for a \(domain, generator\) pair we use the firstN/2N/2per class to fit the per\-view GPs and hold out the remainingN/2N/2per class as a calibration set𝒞\\mathcal\{C\}\. Because the GPs never see𝒞\\mathcal\{C\}, their predictions on it are unbiased estimates of test\-time behavior, eliminating the calibration\-leakage failure mode of in\-sample stacking\.
Table 1:Detection and calibration performance on the PAN2025 benchmark atN=32N\{=\}32training examples per class\.±\\pmindicates sample std over three samples\.MC= machine\-continued;MP= machine\-polished\. ECE and Brier are lower\-is\-better\.Table 2:Detection and calibration performance on the RAID benchmark atN=32N=32training examples per class\.±\\pmindicates sample std over three samples\.
#### Second\-stage logistic regression\.
For eachxi∈𝒞x\_\{i\}\\in\\mathcal\{C\}we collect the per\-view probability vector𝐩\(xi\)=\(p1\(xi\),…,pK\(xi\)\)∈\[0,1\]K\\mathbf\{p\}\(x\_\{i\}\)=\\bigl\(p\_\{1\}\(x\_\{i\}\),\\dots,p\_\{K\}\(x\_\{i\}\)\\bigr\)\\in\[0,1\]^\{K\}and fit anℓ2\\ell\_\{2\}\-regularized logistic regression\. The learned weights expose how much each view contributes to the final detection score and absorb challenges that any individual GP may introduce\. This pipeline can be visualized in[Figure 1](https://arxiv.org/html/2606.14060#S2.F1)\. We also consider alternative aggregation strategies in[Appendix C](https://arxiv.org/html/2606.14060#A3)\.
### 3\.5Evaluation Protocol
Each trained model is evaluated in a*cross\-attack setting*, where the model is evaluated against attacks it has never seen\.
#### Metrics\.
We report partial AUROC at a maximum false\-positive rate of1%1\\%\(AUROC@1%\), which stress\-tests the low\-FPR regime relevant to high\-stakes applications\. We observe that AUROC over all operating points becomes saturated, making it difficult to observe changes between detectors\. We additionally report Brier score and Expected Calibration Error \(ECE\) to measure probabilistic calibrationNaeiniet al\.\([2014](https://arxiv.org/html/2606.14060#bib.bib29)\), lower is better for both of these metrics\.
## 4Experiments
Table 3:Detection and calibration performance on the DetectRL benchmark atN=32N=32training examples per class\.±\\pmindicates sample std over three samples\.### 4\.1Datasets
We evaluate our approach on a diverse set of datasets, each consisting of several attacks, domains, and language models444We plan to publish our splits alongside our code to reproduce results\.PAN2025Bevendorffet al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib24)\)The PAN2025 shared task considers five mixtures of human and machine writing: machine\-written then human\-edited, deeply\-mixed text, human\-initiated then machine\-continued, human\-written then machine\-polished, and machine\-written then machine\-humanized555[https://pan\.webis\.de/clef25/pan25\-web/generated\-content\-analysis\.html](https://pan.webis.de/clef25/pan25-web/generated-content-analysis.html)\. We focus our evaluations on the held out “human\-initiated then machine\-continued” \(MC\) and “human\-written then machine\-polished” \(MP\) conditions, using the remaining machine splits to form a machine support\.
RAIDDuganet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib23)\)We build custom splits from the publicly released RAID dataset to match our problem statement666[https://huggingface\.co/datasets/liamdugan/raid](https://huggingface.co/datasets/liamdugan/raid)\. Specifically, we sample data from three diverse domains \(News, Reddit, and Amazon Reviews\) and consider all attacks in the datasetDuganet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib23)\)\. Within each domain, similar to DetectRL, to form a few\-shot machine support, we sample exemplars from all attacks except one evaluation attack, the evaluation split contains human data, and machine sample from that attack only\. We also control for language model: GPT\-4o is held out from all training splits and only included as a held out model during evaluation\.
DetectRLWuet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib22)\)We use the exact splits released by the authors to evaluate our detector777[https://github\.com/NLP2CT/DetectRL](https://github.com/NLP2CT/DetectRL)\. We evaluate on the “Multi\-Attack” setting in our main experiments to match the problem statement outlined in[subsection 2\.2](https://arxiv.org/html/2606.14060#S2.SS2)to control for domain\. To form the few\-shot machine support, we sample exemplars from all attacks*except*for one evaluation attack, the evaluation split contains human data, and machine samples from that attack only\.
### 4\.2Baselines
We compare our approach to several popular zero\-shot methods: BinocularsHanset al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib5)\), Fast\-DetectGPTBaoet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib4)\), Log\-Likelihood Log\-Rank Ratio \(LRR\)Suet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib8)\), Log\-RankSuet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib8)\), MAGELiet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib6)\), RADARHuet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib18)\), ReMoDetectLeeet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib16)\), and MPUTianet al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib17)\)\. Some of these approaches are classifier based, and return interpretable scores between zero and one to discriminate between human and machine text, while others return a raw score\. To compute calibration metrics on the latter, we calibrate the raw scores using Platt scaling on held out calibration dataPlatt \([1999](https://arxiv.org/html/2606.14060#bib.bib25)\)\. Noting that the previous approaches are all zero\-shot and do not make use of the data our system uses beyond calibration, we also train a transformer based binary classifier on the same available to our system by fine\-tuning RoBERTaLiuet al\.\([2019](https://arxiv.org/html/2606.14060#bib.bib39)\)\. Details for this classifier can be found in[Appendix B](https://arxiv.org/html/2606.14060#A2)\.
### 4\.3Main Experiments
Figure 2:Detection \(left\) and calibration \(right\) performance on the News split of the RAID dataset\. The GP based approach is able to quickly reach a strong performance point with a small number of training samples, variance of results also decreases significantly with more training samples\. Shaded regions show±\\pm1 standard error of the mean across different samples and evaluation conditions\.This section describes our experiments across adversarial data splits from the PAN2025 shared task[Table 1](https://arxiv.org/html/2606.14060#S3.T1), RAID benchmark[Table 2](https://arxiv.org/html/2606.14060#S3.T2), and the DetectRL benchmark[Table 3](https://arxiv.org/html/2606.14060#S4.T3)\. Due to space constraints, we report results for two domains from the RAID dataset in[Table 2](https://arxiv.org/html/2606.14060#S3.T2)and for two attacks from the DetectRL benchmark in[Table 3](https://arxiv.org/html/2606.14060#S4.T3)\. Additional results with similar trends can be found in[Appendix E](https://arxiv.org/html/2606.14060#A5)\. Firstly, we confirm findings from previous work and demonstrate that state\-of\-the\-art detectors are brittle to attacksKrishnaet al\.\([2023](https://arxiv.org/html/2606.14060#bib.bib7)\); Sotoet al\.\([2025](https://arxiv.org/html/2606.14060#bib.bib12)\), especially under low FPR operating conditions\. The brittleness of these systems extends to calibration, further analysis of this consequence can be found in the following section\. Throughout our results, the fine\-tuned RoBERTa classifier outperforms most baselines and is most competitive with our system\. We do note that the error bars for this baseline are significantly wider for many conditions, indicating that training is sensitive to the few\-shot sample selected\. Beyond RoBERTa, across most conditions considered, our few\-shot approach significantly outperforms the considered zero\-shot baselines under both detection, and calibration metrics\.
We report test results on all three datasets, with hyperparameters for our system chosen based on a separate development split of the PAN2025 dataset\.[Table 1](https://arxiv.org/html/2606.14060#S3.T1)shows performance under two difficult machine splits: human\-initiated and machine\-continued \(MC\) and human\-written and machine\-polished \(MP\)\. These results have the lowest absolute scores relative to other datasets due to the level of mixing human and machine text\.[Table 2](https://arxiv.org/html/2606.14060#S3.T2)shows performance for the three considered RAID domains, with performance averaged across all 11 considered attacks\. Full results for each attack can be found in[Appendix D](https://arxiv.org/html/2606.14060#A4)\.
How does system performance scale with few\-shot samples?The baselines we consider are zero\-shot, however we do make use of few\-shot samples to calibrate these approaches \(§[4\.2](https://arxiv.org/html/2606.14060#S4.SS2)\)\.[Figure 2](https://arxiv.org/html/2606.14060#S4.F2)shows system performance against number of few\-shot training samples on the News split of the RAID dataset\. All approaches are calibrated on the same data, and results are averaged across all 11 attacks\. Our approach is able to make use of small amounts of data for detection and there is significantly less variance associated with the selection of this data compared to other methods\. Notably, system calibration improves significantly with the number of samples\.
How effectively does our system trade dataset coverage for performance?We visualize this trade\-off using a Performance\-Coverage curve, an adaptation of the standard Risk\-Coverage formulationGeifman and El\-Yaniv \([2017](https://arxiv.org/html/2606.14060#bib.bib28)\)\.[Figure 3](https://arxiv.org/html/2606.14060#S4.F3)shows performance \(AUROC@1%\) against dataset coverage on test data consisting of a held out a paraphrasing attack\. To generate these curves we rank all test data by each system’s uncertainty, as more of dataset is covered \(x\-axis\), the system is forced to produce responses it is more and more uncertain about\. Across the three domains in the RAID dataset we observe that the GP’s uncertainty is indeed informative, as abstention decreases, performance monotonically decreases\. We observe that several baselines produce the opposite result with performance increasing as abstention decreases, further confirming poor calibration\.
Figure 3:Performance\-Coverage curves on the News split of the RAID dataset\. Our system demonstrates a monotonically decreasing trend as it is forced to evaluate samples it is increasingly uncertain about\.How does the system handle far OOD data?To further understand how well our system handles uncertainty we consider far\-OOD data\. Our main results feature held out, OOD attacks, but test data is from the same domain as the few\-shot exemplars simulating a near\-OOD setting\. A well calibrated system should be uncertain about predictions on far\-OOD data, rather than confidently incorrect \(e\.g\., false positives when a detector observes human\-written text in a previously unobserved language\)\. To simulate this setting, we introduce human\-written and machine\-generated Arabic news\-wire data from the M4 datasetWanget al\.\([2024](https://arxiv.org/html/2606.14060#bib.bib30)\)\. Figure[4](https://arxiv.org/html/2606.14060#S4.F4)demonstrates how a GP capturing our style view handles the far\-OOD data compared to near\-OOD data \(english data from the News split of the RAID dataset\)\. The posterior variance significantly increases for the Arabic data, which differs stylistically from any of the training data \(human or machine\)\.
Figure 4:Predictive variance of a GP fit on a style view\. English news\-wire data is observed during training, at test time far\-OOD Arabic data results in significantly higher variances raising the uncertainty of the system\.How do different views affect system performance?[Table 4](https://arxiv.org/html/2606.14060#S4.T4)shows an ablation on view configurations\. We find that combining views improves detection performance across all views\.
Table 4:Per\-view ablation on the RAID dataset\. Best per column inbold\.
## 5Conclusion
Reliable detection of generated texts is complicated by the diversity of text genres, language models, prompting strategies, and evasion techniques\. A universal detector covering all conceivable settings may always be vulnerable to a new evasion strategy\. This paper addresses a more modest goal: how to developed specialized detectors for specific settings, and how to characterize the data that those detectors will work well on via calibrated uncertainty estimates\. To this end, we have shown that the proposed non\-parametric approach enables reliable detection using small amounts of human and machine\-generated data from the target distribution, that it yields well\-calibrated uncertainty estimates, and that the uncertainty estimates can detect both near\- and far\-OOD data\. Our experiments confirm that this is due to two key design choices: \(1\) the use of multiple complementary views capturing different facets of generated text; \(2\) the GP classification framework built on top of the distinct feature spaces and then aggregated via a learned calibration map\.
## Limitations
We experiment with three views \(stylistic, structural, and probabilistic\) which our experiments show have complementary benefits enabling robustness to distribution shifts and evasion techniques\. Having shown the value of multiple views for robust detection, a natural next step would be experiment with further feature spaces that could provide orthogonal benefits\. For example, looking at how features vary within documents could perhaps help detect when editing has taken place\. Even with the improved performance of our system, there are significant performance drops in the low FPR operating conditions, which are most like real world scenarios \(e\.g\., plagiarism or cheating accusations in academic settings\), more work is needed to validate these systems in such settings\. Separately, our experiments primarily employ existing established benchmarks for machine\-text detection, which are primarily English\. Our experiments in far\-OOD detection suggest that our approach is robust to different languages from those the detector is fit on \(appropriately raising uncertainty\), but future work should validate the proposed approach in truly multilingual settings\.
## References
- E\. Almazrouei, H\. Alobeidli, A\. Alshamsi, A\. Cappelli, R\. Cojocaru, M\. Debbah, É\. Goffinet, D\. Hesslow, J\. Launay, Q\. Malartic, D\. Mazzotta, B\. Noune, B\. Pannier, and G\. Penedo \(2023\)The falcon series of open language models\.External Links:2311\.16867,[Link](https://arxiv.org/abs/2311.16867)Cited by:[§3\.1](https://arxiv.org/html/2606.14060#S3.SS1.SSS0.Px2.p1.1)\.
- Beemo: benchmark of expert\-edited machine\-generated outputs\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),L\. Chiruzzo, A\. Ritter, and L\. Wang \(Eds\.\),Albuquerque, New Mexico,pp\. 6992–7018\.External Links:[Link](https://aclanthology.org/2025.naacl-long.357/),[Document](https://dx.doi.org/10.18653/v1/2025.naacl-long.357),ISBN 979\-8\-89176\-189\-6Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- G\. Bao, Y\. Zhao, Z\. Teng, L\. Yang, and Y\. Zhang \(2024\)Fast\-detectGPT: efficient zero\-shot detection of machine\-generated text via conditional probability curvature\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Bpcgcr8E8Z)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.5.4.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.5.4.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.5.4.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.16.12.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.16.12.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.28.24.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.28.24.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.28.24.7)\.
- J\. Bevendorff, D\. Dementieva, M\. Fröbe, B\. Gipp, A\. Greiner\-Petter, J\. Karlgren, M\. Mayerl, P\. Nakov, A\. Panchenko, M\. Potthast, A\. Shelmanov, E\. Stamatatos, B\. Stein, Y\. Wang, M\. Wiegmann, and E\. Zangerle \(2025\)Overview of pan 2025: voight\-kampff generative ai detection, multilingual text detoxification, multi\-author writing style analysis, and generative plagiarism detection\.InExperimental IR Meets Multilinguality, Multimodality, and Interaction: 16th International Conference of the CLEF Association, CLEF 2025, Madrid, Spain, September 9–12, 2025, Proceedings,Berlin, Heidelberg,pp\. 388–411\.External Links:ISBN 978\-3\-032\-04353\-5,[Link](https://doi.org/10.1007/978-3-032-04354-2_21),[Document](https://dx.doi.org/10.1007/978-3-032-04354-2%5F21)Cited by:[§4\.1](https://arxiv.org/html/2606.14060#S4.SS1.p1.1.1)\.
- G\. Comaniciet al\.\(2025\)Gemini 2\.5: pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities\.External Links:2507\.06261,[Link](https://arxiv.org/abs/2507.06261)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- L\. Dugan, A\. Hwang, F\. Trhlík, A\. Zhu, J\. M\. Ludan, H\. Xu, D\. Ippolito, and C\. Callison\-Burch \(2024\)RAID: a shared benchmark for robust evaluation of machine\-generated text detectors\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 12463–12492\.External Links:[Link](https://aclanthology.org/2024.acl-long.674/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.674)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1),[§4\.1](https://arxiv.org/html/2606.14060#S4.SS1.p2.1),[§4\.1](https://arxiv.org/html/2606.14060#S4.SS1.p2.1.1)\.
- B\. Emi and M\. Spero \(2024\)Technical report on the pangram ai\-generated text classifier\.External Links:2402\.14873,[Link](https://arxiv.org/abs/2402.14873)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- L\. Gehring and B\. Paaßen \(2025\)Assessing llm text detection in educational contexts: does human contribution affect detection?\.External Links:2508\.08096,[Link](https://arxiv.org/abs/2508.08096)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- S\. Gehrmann, H\. Strobelt, and A\. Rush \(2019\)GLTR: statistical detection and visualization of generated text\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations,M\. R\. Costa\-jussà and E\. Alfonseca \(Eds\.\),Florence, Italy,pp\. 111–116\.External Links:[Link](https://aclanthology.org/P19-3019/),[Document](https://dx.doi.org/10.18653/v1/P19-3019)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[§1](https://arxiv.org/html/2606.14060#S1.p2.1)\.
- Y\. Geifman and R\. El\-Yaniv \(2017\)Selective classification for deep neural networks\.InProceedings of the 31st International Conference on Neural Information Processing Systems,NIPS’17,Red Hook, NY, USA,pp\. 4885–4894\.External Links:ISBN 9781510860964Cited by:[§4\.3](https://arxiv.org/html/2606.14060#S4.SS3.p4.1)\.
- N\. Glazer, D\. Chernin, I\. Achituve, S\. Gannot, and E\. Fetaya \(2025\)Few\-shot speech deepfake detection adaptation with gaussian processes\.External Links:2505\.23619,[Link](https://arxiv.org/abs/2505.23619)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- A\. Grattafioriet al\.\(2024\)The llama 3 herd of models\.External Links:2407\.21783,[Link](https://arxiv.org/abs/2407.21783)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- A\. Hans, A\. Schwarzschild, V\. Cherepanova, H\. Kazemi, A\. Saha, M\. Goldblum, J\. Geiping, and T\. Goldstein \(2024\)Spotting llms with binoculars: zero\-shot detection of machine\-generated text\.External Links:2401\.12070Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.3.2.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.3.2.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.3.2.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.10.6.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.10.6.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.16.12.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.16.12.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.16.12.7)\.
- X\. He, X\. Shen, Z\. Chen, M\. Backes, and Y\. Zhang \(2024\)MGTBench: benchmarking machine\-generated text detection\.External Links:2303\.14822,[Link](https://arxiv.org/abs/2303.14822)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- J\. Hensman, A\. Matthews, and Z\. Ghahramani \(2015\)Scalable Variational Gaussian Process Classification\.InProceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics,G\. Lebanon and S\. V\. N\. Vishwanathan \(Eds\.\),Proceedings of Machine Learning Research, Vol\.38,San Diego, California, USA,pp\. 351–360\.External Links:[Link](https://proceedings.mlr.press/v38/hensman15.html)Cited by:[§3\.3](https://arxiv.org/html/2606.14060#S3.SS3.p1.4)\.
- X\. Hu, P\. Chen, and T\. Ho \(2023\)RADAR: robust AI\-text detection via adversarial learning\.InThirty\-seventh Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=QGrkbaan79)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.8.7.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.8.7.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.8.7.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.25.21.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.25.21.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.46.42.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.46.42.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.46.42.7)\.
- D\. Ippolito, D\. Duckworth, C\. Callison\-Burch, and D\. Eck \(2020\)Automatic detection of generated text is easiest when humans are fooled\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 1808–1822\.External Links:[Link](https://aclanthology.org/2020.acl-main.164/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.164)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- J\. Kirchenbauer, J\. Geiping, Y\. Wen, J\. Katz, I\. Miers, and T\. Goldstein \(2023\)A watermark for large language models\.InProceedings of the 40th International Conference on Machine Learning,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 17061–17084\.External Links:[Link](https://proceedings.mlr.press/v202/kirchenbauer23a.html)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1)\.
- K\. Krishna, Y\. Song, M\. Karpinska, J\. F\. Wieting, and M\. Iyyer \(2023\)Paraphrasing evades detectors of AI\-generated text, but retrieval is an effective defense\.InThirty\-seventh Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=WbFhFvjjKj)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1),[§4\.3](https://arxiv.org/html/2606.14060#S4.SS3.p1.1)\.
- H\. Lee, J\. Tack, and J\. Shin \(2024\)ReMoDetect: reward models recognize aligned LLM’s generations\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=pW9Jwim918)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.9.8.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.9.8.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.9.8.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.31.27.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.31.27.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.58.54.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.58.54.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.58.54.7)\.
- Y\. Li, Q\. Li, L\. Cui, W\. Bi, Z\. Wang, L\. Wang, L\. Yang, S\. Shi, and Y\. Zhang \(2024\)MAGE: machine\-generated text detection in the wild\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 36–53\.External Links:[Link](https://aclanthology.org/2024.acl-long.3/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.3)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.7.6.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.7.6.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.7.6.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.22.18.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.22.18.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.40.36.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.40.36.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.40.36.7)\.
- Y\. Liu, M\. Ott, N\. Goyal, J\. Du, M\. Joshi, D\. Chen, O\. Levy, M\. Lewis, L\. Zettlemoyer, and V\. Stoyanov \(2019\)RoBERTa: a robustly optimized bert pretraining approach\.External Links:1907\.11692,[Link](https://arxiv.org/abs/1907.11692)Cited by:[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1)\.
- E\. Mitchell, Y\. Lee, A\. Khazatsky, C\. D\. Manning, and C\. Finn \(2023\)DetectGPT: zero\-shot machine\-generated text detection using probability curvature\.External Links:2301\.11305,[Link](https://arxiv.org/abs/2301.11305)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1)\.
- M\. P\. Naeini, G\. F\. Cooper, and M\. Hauskrecht \(2014\)Binary classifier calibration: non\-parametric approach\.External Links:1401\.3390,[Link](https://arxiv.org/abs/1401.3390)Cited by:[§3\.5](https://arxiv.org/html/2606.14060#S3.SS5.SSS0.Px1.p1.1)\.
- C\. Nicks, E\. Mitchell, R\. Rafailov, A\. Sharma, C\. D\. Manning, C\. Finn, and S\. Ermon \(2024\)Language model detectors are easily optimized against\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=4eJDMjYZZG)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- OpenAIet al\.\(2024\)GPT\-4 technical report\.External Links:2303\.08774,[Link](https://arxiv.org/abs/2303.08774)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- A\. Patel, N\. Andrews, and C\. Callison\-Burch \(2024\)Low\-resource authorship style transfer: can non\-famous authors be imitated?\.External Links:2212\.08986,[Link](https://arxiv.org/abs/2212.08986)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1)\.
- J\. Platt \(1999\)Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods\.External Links:[Link](https://api.semanticscholar.org/CorpusID:56563878)Cited by:[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1)\.
- R\. A\. Rivera\-Soto, O\. E\. Miano, J\. Ordonez, B\. Y\. Chen, A\. Khan, M\. Bishop, and N\. Andrews \(2021\)Learning universal authorship representations\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 913–919\.External Links:[Link](https://aclanthology.org/2021.emnlp-main.70/),[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.70)Cited by:[§3\.1](https://arxiv.org/html/2606.14060#S3.SS1.SSS0.Px1.p1.1)\.
- V\. S\. Sadasivan, A\. Kumar, S\. Balasubramanian, W\. Wang, and S\. Feizi \(2025\)Can AI\-generated text be reliably detected? stress testing AI text detectors under various attacks\.Transactions on Machine Learning Research\.Note:External Links:ISSN 2835\-8856,[Link](https://openreview.net/forum?id=OOgsAZdFOt)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- I\. Solaiman, M\. Brundage, J\. Clark, A\. Askell, A\. Herbert\-Voss, J\. Wu, A\. Radford, G\. Krueger, J\. W\. Kim, S\. Kreps,et al\.\(2019\)Release strategies and the social impacts of language models\.arXiv preprint arXiv:1908\.09203\.Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1)\.
- R\. A\. R\. Soto, K\. Koch, A\. Khan, B\. Y\. Chen, M\. Bishop, and N\. Andrews \(2024\)Few\-shot detection of machine\-generated text using style representations\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=cWiEN1plhJ)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1)\.
- R\. R\. Soto, B\. Chen, and N\. Andrews \(2025\)Language models optimized to fool detectors still have a distinct style \(and how to change it\)\.External Links:2505\.14608,[Link](https://arxiv.org/abs/2505.14608)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1),[§4\.3](https://arxiv.org/html/2606.14060#S4.SS3.p1.1)\.
- J\. Su, T\. Y\. Zhuo, D\. Wang, and P\. Nakov \(2023\)DetectLLM: leveraging log rank information for zero\-shot detection of machine\-generated text\.InThe 2023 Conference on Empirical Methods in Natural Language Processing,External Links:[Link](https://openreview.net/forum?id=Dy2mbQIdMz)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.4.3.1),[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.6.5.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.4.3.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.6.5.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.4.3.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.6.5.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.13.9.4),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.19.15.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.13.9.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.19.15.4),[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.22.18.7),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.34.30.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.22.18.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.34.30.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.22.18.7),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.34.30.7)\.
- K\. Thai, B\. Emi, E\. Masrour, and M\. Iyyer \(2026\)EditLens: quantifying the extent of AI editing in text\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=gOkitaPCfZ)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p1.1)\.
- Y\. Tian, H\. Chen, X\. Wang, Z\. Bai, Q\. ZHANG, R\. Li, C\. Xu, and Y\. Wang \(2024\)Multiscale positive\-unlabeled detection of AI\-generated texts\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=5Lp6qU9hzV)Cited by:[Table 10](https://arxiv.org/html/2606.14060#A4.T10.5.2.1.1),[Table 8](https://arxiv.org/html/2606.14060#A4.T8.5.2.1.1),[Table 9](https://arxiv.org/html/2606.14060#A4.T9.5.2.1.1),[Table 11](https://arxiv.org/html/2606.14060#A5.T11.28.24.4),[Table 12](https://arxiv.org/html/2606.14060#A5.T12.28.24.4),[§1](https://arxiv.org/html/2606.14060#S1.p1.1),[Table 1](https://arxiv.org/html/2606.14060#S3.T1.52.48.7),[Table 2](https://arxiv.org/html/2606.14060#S3.T2.52.48.7),[§4\.2](https://arxiv.org/html/2606.14060#S4.SS2.p1.1),[Table 3](https://arxiv.org/html/2606.14060#S4.T3.52.48.7)\.
- M\. Titsias \(2009\)Variational learning of inducing variables in sparse gaussian processes\.InProceedings of the Twelfth International Conference on Artificial Intelligence and Statistics,D\. van Dyk and M\. Welling \(Eds\.\),Proceedings of Machine Learning Research, Vol\.5,Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA,pp\. 567–574\.External Links:[Link](https://proceedings.mlr.press/v5/titsias09a.html)Cited by:[§3\.3](https://arxiv.org/html/2606.14060#S3.SS3.p1.4)\.
- V\. Verma, E\. Fleisig, N\. Tomlin, and D\. Klein \(2024\)Ghostbuster: detecting text ghostwritten by large language models\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 1702–1717\.External Links:[Link](https://aclanthology.org/2024.naacl-long.95/),[Document](https://dx.doi.org/10.18653/v1/2024.naacl-long.95)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1)\.
- T\. Wang, Y\. Chen, Z\. Liu, Z\. Chen, H\. Chen, X\. Zhang, and W\. Cheng \(2025\)Humanizing the machine: proxy attacks to mislead LLM detectors\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=PIpGN5Ko3v)Cited by:[§1](https://arxiv.org/html/2606.14060#S1.p2.1)\.
- Y\. Wang, J\. Mansurov, P\. Ivanov, J\. Su, A\. Shelmanov, A\. Tsvigun, C\. Whitehouse, O\. Mohammed Afzal, T\. Mahmoud, T\. Sasaki, T\. Arnold, A\. F\. Aji, N\. Habash, I\. Gurevych, and P\. Nakov \(2024\)M4: multi\-generator, multi\-domain, and multi\-lingual black\-box machine\-generated text detection\.InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 1369–1407\.External Links:[Link](https://aclanthology.org/2024.eacl-long.83/),[Document](https://dx.doi.org/10.18653/v1/2024.eacl-long.83)Cited by:[§4\.3](https://arxiv.org/html/2606.14060#S4.SS3.p5.1)\.
- J\. Wu, R\. Zhan, D\. F\. Wong, S\. Yang, X\. Yang, Y\. Yuan, and L\. S\. Chao \(2024\)DetectRL: benchmarking LLM\-generated text detection in real\-world scenarios\.InThe Thirty\-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track,External Links:[Link](https://openreview.net/forum?id=ZGMkOikEyv)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p2.1),[§4\.1](https://arxiv.org/html/2606.14060#S4.SS1.p3.1.1)\.
- X\. Yang, W\. Cheng, Y\. Wu, L\. R\. Petzold, W\. Y\. Wang, and H\. Chen \(2024\)DNA\-GPT: divergent n\-gram analysis for training\-free detection of GPT\-generated text\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Xlayxj2fWp)Cited by:[§2\.1](https://arxiv.org/html/2606.14060#S2.SS1.p1.1)\.
## Appendix AGaussian Process Training Details
### A\.1Robust Standardization of Distance Features
Before being passed to the per\-view GP kernels, the distance featuresψk\(x\)\\psi\_\{k\}\(x\)defined in Section[3\.2](https://arxiv.org/html/2606.14060#S3.SS2)are normalized to a dimensionless scale using the support set’s median and median absolute deviation \(MAD\):
ψ~k\(x\)=ψk\(x\)−median\[ψk\(𝒮\)\]MAD\[ψk\(𝒮\)\]×1\.4826,\\tilde\{\\psi\}\_\{k\}\(x\)=\\frac\{\\psi\_\{k\}\(x\)\-\\operatorname\{median\}\[\\psi\_\{k\}\(\\mathcal\{S\}\)\]\}\{\\operatorname\{MAD\}\[\\psi\_\{k\}\(\\mathcal\{S\}\)\]\\times 1\.4826\},\(2\)where the factor1\.48261\.4826makes the MAD consistent with the standard deviation under Gaussian data\. The use of median/MAD rather than mean/standard deviation makes the standardization robust to the heavy\-tailed distance distributions that arise under adversarial attacks\. This maps all views to a common scale, making GP kernel hyperparameters and predictive probabilities directly comparable across views\.
### A\.2Training Hyperparameters
Table[5](https://arxiv.org/html/2606.14060#A1.T5)lists the hyperparameters used to train each per\-view GP\. The same settings are used for every view, dataset, and training\-set size in our experiments\. All experiments were conducted on a single V100 GPU\.
Table 5:Per\-view GP training hyperparameters\.
### A\.3Structural View Implementation
The structural view summarizes each document with eight surface\-level features that are cheap to compute and require no learned model\. Tokenization uses NLTK’spunkt\_tabsentence and word tokenizers; paragraphs are approximated by splitting on double newlines\. Table[6](https://arxiv.org/html/2606.14060#A1.T6)lists the features\.
Table 6:Per\-document structural features\. Each document is represented as anℝ8\\mathbb\{R\}^\{8\}vector\.
## Appendix BBaseline Classifier Training Details
We fine\-tune RoBERTa\-base on the exact same data splits seen by our system888[https://huggingface\.co/FacebookAI/roberta\-base](https://huggingface.co/FacebookAI/roberta-base)\. We train for 10 epochs using the AdamW optimizer and a learning rate of2e−52e^\{\-5\}
## Appendix CAlternative Aggregation Strategies
Here we explore several alternative approaches to aggregating features and probabilities across views\.[Table 7](https://arxiv.org/html/2606.14060#A3.T7)shows cross\-attack performance with these strategies\. First we consider eliminating the multi\-classifier approach in favor of concatenated features fed to either a logistic regression or multi\-layer perceptron \(MLP\) "LR/MLP \(Concat\)"\. This gives the system the ability to see all features at the same time, but makes single\-view attacks more difficult to identify as out of distribution\. We then consider replacing the multi\-GP configuration with a multiple logistic regression or multiple MLP classifier with mean pooled probabilities\. We find that the stacked GP configuration provides the most consistent performance\.
Table 7:DetectRL Cross\-Attack Generalization with various aggregation strategies\.Boldindicates best performance,underlineindicates 2nd best per column\.*Data Mix*attack does not contain in\-domain data\.
## Appendix DAdditional RAID Experiments
[Table 2](https://arxiv.org/html/2606.14060#S3.T2)averages results over the 11 considered attacks in the RAID benchmark\. We breakdown performance across those attacks in[Table 8](https://arxiv.org/html/2606.14060#A4.T8),[Table 9](https://arxiv.org/html/2606.14060#A4.T9), and[Table 10](https://arxiv.org/html/2606.14060#A4.T10)\.
Table 8:AUROC@1% per attack —Newssplit of RAID dataset\. Italics indicate near\-chance performance \(<0\.55<0\.55\)\.Table 9:AUROC@1% per attack —Redditsplit of RAID dataset\. Italics indicate near\-chance performance \(<0\.55<0\.55\)\.Table 10:AUROC@1% per attack —Reviewssplit of RAID dataset\. Italics indicate near\-chance performance \(<0\.55<0\.55\)\.
## Appendix EAdditional Domains and Attacks
Due to space limitations,[Table 2](https://arxiv.org/html/2606.14060#S3.T2)only reported results on two domains and[Table 3](https://arxiv.org/html/2606.14060#S4.T3)only reported results on two attacks\. We report an additional domain from the RAID dataset \([Table 11](https://arxiv.org/html/2606.14060#A5.T11)\) and attack from the DetectRL benchmark \([Table 12](https://arxiv.org/html/2606.14060#A5.T12)\)\. The results on these extra splits share similar trends with our main experiments in[subsection 4\.3](https://arxiv.org/html/2606.14060#S4.SS3)\.
Table 11:Detection and calibration performance on the RAIDReviewsdomain atN=32N=32training examples per class\.±\\pmindicates sample std over three samples\.Table 12:Detection and calibration performance on the DetectRLPromptattack atN=32N=32training examples per class\.±\\pmindicates sample std over three samples\.Similar Articles
Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement
This paper reveals the existence of hidden human-like spans in machine-generated texts and proposes a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of these spans.
Spotlights and Blindspots: Evaluation Machine-Generated Text Detection
This paper evaluates 15 machine-generated text detection models across six systems and multiple datasets, finding high variance in model rankings based on dataset and metric choices, with poor performance on novel human-written texts in high-risk domains. The authors highlight that methodological choices in evaluation are critical for accurately reflecting model performance.
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection
This paper proposes a multi-level contextual token relation modeling framework for machine-generated text detection, integrating local Markov-informed calibration and global rule-support reasoning to improve detection across cross-LLM and cross-domain settings with low computational overhead.
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
This paper investigates the resilience of AI-generated text detection methods (fine-tuned RoBERTa, Binoculars, text feature analysis, and ensembles) against paraphrasing attacks, finding that Binoculars-inclusive ensembles are most effective but also most vulnerable to attacks, highlighting a dichotomy between performance and resilience.
Attacks on Machine-Text Detectors Retain Stylistic Fingerprints
This paper investigates evasion attacks on machine-text detectors, finding that while current attacks degrade detector performance, stylistic fingerprints persist. A novel paraphrasing approach that mimics human styles can evade even style-based detectors, but multi-document analysis recovers detectability.