When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer

arXiv cs.CL Papers

Summary

This paper proposes RidgeFT, a lightweight analytic update framework for lifelong machine-generated text attribution that adapts to new text generators without forgetting old ones, achieving strong performance across multiple evaluation settings.

arXiv:2606.05626v1 Announce Type: new Abstract: Machine-generated text (MGT) attribution aims to identify the specific generator responsible for a given text, thereby providing fine-grained evidence for model accountability and misuse investigation. As new large language models continue to emerge, attribution models must continuously incorporate new generators while preserving their ability to recognize previously seen ones. Prior works have shown that this lifelong MGT attribution setting is challenging, and existing methods often struggle to achieve a stable balance between adapting to new classes and retaining old ones. To address this issue, we propose RidgeFT, a lightweight analytic update framework that does not rely on exemplar replay. RidgeFT trains a task-aware encoder on the initial generator set, stores compact class-wise sufficient statistics when each generator class is first observed, and then freezes the encoder for replay-free closed-form updates. It then suppresses generator-irrelevant variation through covariance calibration, improves representation capacity with fixed random features, and updates new classes through closed-form ridge regression based on class-level sufficient statistics. Across multi-topic evaluations with varying initial generator setups, RidgeFT consistently outperforms baselines. It achieves the best macro-F1 across domains, backbones, and incremental protocols, while also improving both old-class retention and new-class adaptation. These results suggest that feature-stable analytic updates provide a simple yet effective approach to lifelong MGT attribution.
Original Article
View Cached Full Text

Cached at: 06/05/26, 08:07 AM

# When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer
Source: [https://arxiv.org/html/2606.05626](https://arxiv.org/html/2606.05626)
Zhen Sun1,2,3Yifan Liao3Zhicong Huang2Jiaheng Wei3 Cheng Hong2Yutao Yue3,4Xinlei He1†\\dagger

1Wuhan University2Ant Group 3The Hong Kong University of Science and Technology \(Guangzhou\) 4Institute of Deep Perception Technology, JITRI

Work done during an internship at Ant Group\.Corresponding authors: Zhicong Huang\([zhicong\.hzc@antgroup\.com](https://arxiv.org/html/2606.05626v1/mailto:[email protected])\), Xinlei He\([xinlei\.he@whu\.edu\.cn](https://arxiv.org/html/2606.05626v1/mailto:[email protected])\)\.###### Abstract

Machine\-generated text \(MGT\) attribution aims to identify the specific generator responsible for a given text, thereby providing fine\-grained evidence for model accountability and misuse investigation\. As new large language models continue to emerge, attribution models must continuously incorporate new generators while preserving their ability to recognize previously seen ones\. Prior works have shown that this lifelong MGT attribution setting is challenging, and existing methods often struggle to achieve a stable balance between adapting to new classes and retaining old ones\. To address this issue, we proposeRidgeFT, a lightweight analytic update framework that does not rely on exemplar replay\.RidgeFTtrains a task\-aware encoder on the initial generator set, stores compact class\-wise sufficient statistics when each generator class is first observed, and then freezes the encoder for replay\-free closed\-form updates\. It then suppresses generator\-irrelevant variation through covariance calibration, improves representation capacity with fixed random features, and updates new classes through closed\-form ridge regression based on class\-level sufficient statistics\. Across multi\-topic evaluations with varying initial generator setups,RidgeFTconsistently outperforms baselines\. It achieves the best macro\-F1 across domains, backbones, and incremental protocols, while also improving both old\-class retention and new\-class adaptation\. These results suggest that feature\-stable analytic updates provide a simple yet effective approach to lifelong MGT attribution\.

## Introduction

As generative tools powered by large language models \(LLMs\) become increasingly widespread\[oyelude2024artificial,openclaw2026docs\], users can now conveniently rely on them for text generation and polishing\. While these capabilities improve writing efficiency, they also introduce potential risks of misuse\[kumarage2024survey\]\. For example, users may exploit LLMs to automatically produce large volumes of papers, news articles, reviews, and other forms of text, thereby disrupting the normal order of content production and undermining the credibility of human\-authored writing\[DBLP:journals/coling/WuYZYCW25\]\. Against this backdrop, the effective identification of machine\-generated text \(MGT\) has become an important problem\. To address this issue, existing studies have primarily focused on two aspects of MGT identification: binary detection and source attribution\[DBLP:journals/coling/WuYZYCW25,DBLP:conf/ccs/0001SC0024\]\. Compared with binary detection, MGT attribution further aims to identify the specific source generator, thereby providing finer\-grained evidence for accountability tracking and misuse investigation\[DBLP:conf/emnlp/CavaT25\]\.

![Refer to caption](https://arxiv.org/html/2606.05626v1/x1.png)Figure 1:Illustration of the lifelong MGT attribution setting\.Existing MGT attribution methods typically assume a fixed set of generators, which is often an unrealistic assumption in real\-world settings\. Since the attribution systems operate in a dynamic and open generator space in practice, they should not only recognize newly emerging generators, but also preserve the ability to distinguish previously seen ones\.\[DBLP:conf/kdd/LiuZL0ZWGT00025\]introduces a more realistic setting, known as class\-incremental MGT attribution, which we refer to as lifelong MGT attribution throughout this paper\. Under this setting, the attribution model needs to be continuously updated as new generator classes arrive over time\. However, due to factors such as computational cost, data licensing constraints, or the unavailability of historical data, it is usually impractical to recollect all past data and retrain the model from scratch\[verwimp2023continual,huang2024mitigating\]\. At the same time, directly updating the model using only data from new classes often leads to catastrophic forgetting\[mccloskey1989catastrophic,french1999catastrophic\]\. Therefore, a central challenge in lifelong MGT attribution is efficiently incorporating new generator classes under limited data while maintaining stable recognition of previously learned generators\.

Under this challenge, we posit that the difficulty of lifelong MGT attribution does not necessarily need to be addressed by continuously updating the entire text encoder\. Instead, a task\-tuned encoder trained on the initial set of generators can already capture strong generator\-related representations\. If this encoder continues to be fine\-tuned during the incremental stage, the representation space will continue to shift as new classes arrive, thereby making the decision boundaries of old classes unstable\[caccia2021new,yu2020semantic\]\. We therefore seek to decouple the learning of new generators from deep representation updating, ensuring that incremental knowledge is absorbed without altering or disrupting the stable representation space\. Motivated by this idea, we proposeRidgeFT, an exemplar\-free analytic update framework for lifelong MGT attribution\. We consider a practical deployment scenario where the attribution system is trained and maintained from the initial stage\. During initialization,RidgeFTuses the initial\-class data to train a task\-aware encoder and construct compact class\-wise sufficient statistics; after that, the raw texts of old classes are discarded and never replayed\. When new generators arrive,RidgeFTkeeps the encoder frozen, maps the new data through covariance calibration and fixed random features, accumulates the corresponding statistics, and updates the classifier by a closed\-form ridge solution\. In this way, incremental learning is performed through statistical memory rather than historical\-text replay or repeated encoder fine\-tuning\. We evaluateRidgeFTin a multi\-topic setting\[DBLP:conf/kdd/LiuZL0ZWGT00025,DBLP:conf/acl/0001Z0ZL0Z025\], using P3, P4, and P5 protocols that start from 3, 4, and 5 initial classes and incrementally add 3, 2, and 1 new generator classes, respectively\. Under the standard P5 protocol,RidgeFTachieves 0\.886 full\-F1, 0\.902 old\-class F1, and 0\.804 new\-class F1, improving full\-F1 by 0\.037 over the strongest continual\-learning baseline\.

Our contributions are summarized as follows:

- •We identify lifelong MGT attribution as a generator\-evolving attribution problem, where the model must preserve generator\-specific decision boundaries while suppressing topic\-, domain\-, and prompt\-induced nuisance variation under an exemplar\-free update constraint\.
- •We proposeRidgeFT, an exemplar\-free analytic update framework that combines fractional covariance calibration, isotropic random feature lifting, and class\-balanced closed\-form ridge regression, enabling incremental updates using only compact class\-wise sufficient statistics\.
- •We conduct extensive experiments across multiple topics, multiple target generators, and two backbones\. The results show thatRidgeFT’s main advantage lies in substantially improving new\-generator adaptation while maintaining competitive old\-generator retention\.

## Related Work

Establishing effective regulatory mechanisms for MGT has become essential for maintaining content credibility and supporting platform governance\[DBLP:journals/coling/WuYZYCW25\]\. Recent research has pushed MGT detection from idealized binary classification toward more complex real\-world scenarios\. Studies have not only developed dynamic benchmarks accounting for multilingual settings and model evolution\[DBLP:conf/acl/MackoKMS25,DBLP:conf/acl/YuYLC0YS25\], but also improved detector generalization to unseen generators and domains\[DBLP:conf/acl/HaoLZYM25,DBLP:conf/acl/Jiao0ZG025,DBLP:conf/emnlp/ChenHHZF25\], alongside enhancing robustness against adversarial attacks\[DBLP:conf/acl/LiZLSL25,DBLP:conf/naacl/LiYTJSCSS25,DBLP:conf/acl/PedrottiPCM0DE25\]\. Additionally, phenomena with blurred boundaries, such as human\-AI collaborative writing, have fallen into the scope of platform monitoring\[DBLP:conf/acl/SuWWZL25,DBLP:conf/acl/SahaF25,DBLP:conf/acl/0001Z0ZL0Z025\]\. However, although MGT detection techniques have made progress in addressing realistic challenges, merely determining whether a text is machine\-generated is no longer sufficient to satisfy the growing demands for accountability tracing and copyright attribution\.

![Refer to caption](https://arxiv.org/html/2606.05626v1/x2.png)Figure 2:Overview ofRidgeFT\.Compared with binary detection, MGT attribution further requires identifying the specific generator responsible for a text, providing critical evidence for model accountability and forensic analysis\[DBLP:conf/ccs/0001SC0024,fang2025could\], and has emerged as an important branch of authorship attribution\[huang2025authorship\]\. While prior studies have explored practical attribution settings\[sarvazyan2023overview,la2025authorship,najjar2025leveraging\], they mostly consider static scenarios where the candidate generator set is fixed\. In real\-world deployment, rapid LLM updates require attribution models to evolve accordingly to recognize new generators\. Simultaneously, data privacy and licensing restrictions often prevent full retraining on historical data\. To address this,\[DBLP:conf/kdd/LiuZL0ZWGT00025\]pioneered class\-incremental MGT attribution and evaluated mainstream continual learning methods\. Nevertheless, existing approaches still struggle to preserve recognition performance on previously seen generators while learning the characteristics of new ones\. Breaking this trade\-off between adaptation to new classes and retention of old ones in lifelong MGT attribution remains an unresolved challenge, which is the central problem addressed in this work\.

## Methodology

In lifelong MGT attribution, continuously fine\-tuning the text encoder on new generators alters the representation geometry of old classes, thereby degrading previous decision boundaries\. To address this, we proposeRidgeFT, which reformulates lifelong attribution as an analytic ridge regression problem based on sufficient statistics\.RidgeFTis designed for an attribution system that is trained and maintained from the initial stage, so the required statistics of initial classes can be recorded when those classes are first available\. By freezing the base text encoder,RidgeFTpreserves prior knowledge and performs analytic updates exclusively on the extracted features\. Given an input textxx, its frozen representationh=fθ​\(x\)h=f\_\{\\theta\}\(x\)is mapped to the final prediction via a novel sequential pipeline: covariance calibration, isotropic random feature lifting, and class\-balanced ridge regression \(h→h~→z​\(x\)→y^h\\rightarrow\\tilde\{h\}\\rightarrow z\(x\)\\rightarrow\\hat\{y\}\)\. The calibration transform and the initial sufficient statistics are computed once from the base\-stage training data and then stored\. During later incremental stages,RidgeFTonly processes newly arriving class data and does not revisit old raw texts\.

Covariance Calibration\.Base representations often capture generator\-irrelevant variations \(e\.g\., topic, length, domain\) as high\-variance directions, which impairs subsequent inner\-product classifiers\. To mitigate this,RidgeFTapplies a fractional whitening transformation to suppress within\-class noise while preserving the original discriminative geometry\. This transform is estimated only from the base\-stage training representations and is kept fixed after initialization, so incremental updates do not require replaying previous raw texts\.

First, we calculate the within\-class scatter matrixSwS\_\{w\}:

Sw=1N−C0​∑c=1C0∑i:yi=c\(hi−μc\)​\(hi−μc\)⊤,S\_\{w\}=\\frac\{1\}\{N\-C\_\{0\}\}\\sum\_\{c=1\}^\{C\_\{0\}\}\\sum\_\{i:y\_\{i\}=c\}\(h\_\{i\}\-\\mu\_\{c\}\)\(h\_\{i\}\-\\mu\_\{c\}\)^\{\\top\},\(1\)whereNNis the total number of base samples,C0C\_\{0\}is the number of base classes,hih\_\{i\}is the representation of theii\-th sample, andμc\\mu\_\{c\}is the feature mean of classcc\. To address the instability of high\-dimensional covariance estimation, we apply trace\-scaled shrinkage:

Swshrink=\(1−α\)​Sw\+α​tr⁡\(Sw\)dh​Idh,S\_\{w\}^\{\\text\{shrink\}\}=\(1\-\\alpha\)S\_\{w\}\+\\alpha\\frac\{\\operatorname\{tr\}\(S\_\{w\}\)\}\{d\_\{h\}\}I\_\{d\_\{h\}\},\(2\)whereα\\alphais the shrinkage parameter,tr⁡\(⋅\)\\operatorname\{tr\}\(\\cdot\)denotes the matrix trace,dhd\_\{h\}is the feature dimension, andIdhI\_\{d\_\{h\}\}is the identity matrix\. Finally, we compute the fractionally whitened representationh~\\tilde\{h\}via the eigenvalue decomposition ofSwshrinkS\_\{w\}^\{\\text\{shrink\}\}:

Swshrink=U​Λ​U⊤,Λ=diag⁡\(σ1,…,σdh\),S\_\{w\}^\{\\text\{shrink\}\}=U\\Lambda U^\{\\top\},\\Lambda=\\operatorname\{diag\}\(\\sigma\_\{1\},\\ldots,\\sigma\_\{d\_\{h\}\}\),\(3\)Given the eigendecomposition ofSwshrinkS\_\{w\}^\{\\mathrm\{shrink\}\}, we calibrate each principal direction according to its estimated within\-class variance:

Pδ=U​\(Λ\+ϵ​Idh\)−δ​U⊤,h~=Pδ​\(h−μ\)\.P\_\{\\delta\}=U\(\\Lambda\+\\epsilon I\_\{d\_\{h\}\}\)^\{\-\\delta\}U^\{\\top\},\\tilde\{h\}=P\_\{\\delta\}\(h\-\\mu\)\.\(4\)
Here,UUandΛ\\Lambdarepresent the eigenvectors and eigenvalues ofSwshrinkS\_\{w\}^\{\\text\{shrink\}\}, respectively, withσj\\sigma\_\{j\}denoting thejj\-th eigenvalue\. The parameterδ∈\[0,1\]\\delta\\in\[0,1\]controls the whitening strength,μ\\muis the global feature mean, andϵ\\epsilonis a small constant ensuring numerical stability\. By applying a fractional exponent instead of full whitening, we attenuate dominant within\-class variations without excessively distorting the original feature space\. This provides a stable input for the downstream analytic classifier\. As exemplified in[Figure˜2](https://arxiv.org/html/2606.05626#S2.F2), this calibration is intended to reduce nuisance directions associated with topic, length, and domain, while preserving generator\-related cues for attribution\.

Isotropic Random Feature Lifting\.After covariance calibration,RidgeFTfurther lifts the calibrated representations into a fixed nonlinear random feature space to boost the expressive capacity of the analytic classifier without sacrificing closed\-form incremental updates\. By adopting a non\-trainable random mapping\[rahimi2007random\]instead of a learnable projection layer, we ensure the feature basis remains strictly invariant upon the arrival of new generators\.

Leth~∈ℝdh\\tilde\{h\}\\in\\mathbb\{R\}^\{d\_\{h\}\}denote the calibrated encoder representation, wheredhd\_\{h\}is the feature dimension\.RidgeFTsamples a Gaussian random matrixR∈ℝdϕ×dhR\\in\\mathbb\{R\}^\{d\_\{\\phi\}\\times d\_\{h\}\}once before incremental learning, with each entry drawn independently asRi​j​∼i\.i\.d\.​𝒩​\(0,1/dh\)R\_\{ij\}\\overset\{\\mathrm\{i\.i\.d\.\}\}\{\\sim\}\\mathcal\{N\}\(0,1/d\_\{h\}\)\. The lifted feature is computed as

z​\(x\)=LN⁡\(ReLU⁡\(R​h~\)\)∈ℝdϕ,z\(x\)=\\operatorname\{LN\}\\left\(\\operatorname\{ReLU\}\(R\\tilde\{h\}\)\\right\)\\in\\mathbb\{R\}^\{d\_\{\\phi\}\},\(5\)wheredϕd\_\{\\phi\}is the random feature dimension andLN⁡\(⋅\)\\operatorname\{LN\}\(\\cdot\)denotes layer normalization\. The matrixRRis fixed throughout all incremental stages, so only the sufficient statistics of the ridge classifier need to be updated\.

We use an isotropic Gaussian projection rather than a data\-dependent projection learned from base classes\. In lifelong MGT attribution, future generators may introduce variation directions that are not present in the initial generator set\. A projection fitted to base\-class geometry may therefore overemphasize existing class separations and reduce coverage of unseen generator\-specific directions\. By contrast, isotropic random features provide a task\-agnostic nonlinear basis that expands the calibrated representation more uniformly, enabling the downstream ridge regressor to form richer decision boundaries without changing the representation space during continual learning\.

Class\-Balanced Analytic Ridge Regression\.Using the random feature representationz​\(x\)z\(x\),RidgeFTtrains the classification head via closed\-form ridge regression\. Because this analytic approach relies solely on second\-order feature statistics, unlike iterative SGD\-based methods, we only need to update the sufficient statistics for each class during the incremental stage, strictly eliminating the need to store raw text from old classes\.

For each classcc,RidgeFTstores only three sufficient statistics in the random feature space: the second\-order statisticAc=∑i:yi=czi​zi⊤A\_\{c\}=\\sum\_\{i:y\_\{i\}=c\}z\_\{i\}z\_\{i\}^\{\\top\}, the first\-order statisticqc=∑i:yi=cziq\_\{c\}=\\sum\_\{i:y\_\{i\}=c\}z\_\{i\}, and the sample countNcN\_\{c\}\. These statistics are computed when classccis first observed and are then retained as statistical memory, so later updates do not require replaying its raw texts\. When a new generator arrives,RidgeFTonly computes the same statistics for the new class and appends them to the existing table\. However, directly summing class statistics can bias the ridge solution toward classes with larger sample counts, since bothAcA\_\{c\}andqcq\_\{c\}scale withNcN\_\{c\}\. To reduce this imbalance,RidgeFTintroduces inverse\-frequency weights based on class sample counts:

ωc=\(Nc\+τ\)−β1\|𝒴t\|​∑c′∈𝒴t\(Nc′\+τ\)−β,c∈𝒴t,\\omega\_\{c\}=\\frac\{\(N\_\{c\}\+\\tau\)^\{\-\\beta\}\}\{\\frac\{1\}\{\|\\mathcal\{Y\}\_\{t\}\|\}\\sum\_\{c^\{\\prime\}\\in\\mathcal\{Y\}\_\{t\}\}\(N\_\{c^\{\\prime\}\}\+\\tau\)^\{\-\\beta\}\},\\qquad c\\in\\mathcal\{Y\}\_\{t\},\(6\)where𝒴t\\mathcal\{Y\}\_\{t\}is the set of classes observed up to incremental stagett,β\\betacontrols the strength of reweighting, andτ\\tauprevents small classes from receiving excessively large weights\. We then construct the ridge regression solution using the weighted statistics\. Specifically, letA¯=∑c∈𝒴tωc​Ac\\bar\{A\}=\\sum\_\{c\\in\\mathcal\{Y\}\_\{t\}\}\\omega\_\{c\}A\_\{c\}, and let thecc\-th column ofB¯\\bar\{B\}beωc​qc\\omega\_\{c\}q\_\{c\}\. The final classifier is given by

W=\(A¯\+λ​Idϕ\)−1​B¯,W=\(\\bar\{A\}\+\\lambda I\_\{d\_\{\\phi\}\}\)^\{\-1\}\\bar\{B\},\(7\)whereλ\>0\\lambda\>0is the ridge regularization coefficient\.

By mitigating the dominance of majority classes via reweighting,RidgeFTconverts incremental updates into the simple accumulation of sufficient statistics and a single linear system solve\. This enables the efficient absorption of new generators without encoder drift\. At inference, the model predicts the class maximizings​\(x\)=z​\(x\)⊤​Ws\(x\)=z\(x\)^\{\\top\}W\.[Algorithm˜1](https://arxiv.org/html/2606.05626#alg1)summarizes this overall procedure\.

Algorithm 1RidgeFTUpdate0:Frozen encoder

fθf\_\{\\theta\}, calibration transform

PδP\_\{\\delta\}, random matrix

RR, streams

\{𝒟t\}t=0T\\\{\\mathcal\{D\}\_\{t\}\\\}\_\{t=0\}^\{T\}
0:Ridge classifier

WW
1:Initialize seen classes

𝒴−1←∅\\mathcal\{Y\}\_\{\-1\}\\leftarrow\\emptysetand statistics

\{Ac,qc,Nc\}\\\{A\_\{c\},q\_\{c\},N\_\{c\}\\\}as empty\.

2:for

t=0,…,Tt=0,\\ldots,Tdo

3:

𝒴t←𝒴t−1\\mathcal\{Y\}\_\{t\}\\leftarrow\\mathcal\{Y\}\_\{t\-1\}\.

4:foreach

\(xi,yi\)∈𝒟t\(x\_\{i\},y\_\{i\}\)\\in\\mathcal\{D\}\_\{t\}do

5:if

yi∉𝒴ty\_\{i\}\\notin\\mathcal\{Y\}\_\{t\}then

6:Add

yiy\_\{i\}to

𝒴t\\mathcal\{Y\}\_\{t\}and initialize

Ayi,qyi,NyiA\_\{y\_\{i\}\},q\_\{y\_\{i\}\},N\_\{y\_\{i\}\}as zero\.

7:endif

8:

zi←LN⁡\(ReLU⁡\(R​Pδ​\(fθ​\(xi\)−μ\)\)\)z\_\{i\}\\leftarrow\\operatorname\{LN\}\\\!\\left\(\\operatorname\{ReLU\}\\\!\\left\(RP\_\{\\delta\}\(f\_\{\\theta\}\(x\_\{i\}\)\-\\mu\)\\right\)\\right\)\.

9:

Ayi←Ayi\+zi​zi⊤,qyi←qyi\+zi,Nyi←Nyi\+1A\_\{y\_\{i\}\}\\leftarrow A\_\{y\_\{i\}\}\+z\_\{i\}z\_\{i\}^\{\\top\},\\quad q\_\{y\_\{i\}\}\\leftarrow q\_\{y\_\{i\}\}\+z\_\{i\},\\quad N\_\{y\_\{i\}\}\\leftarrow N\_\{y\_\{i\}\}\+1\.

10:endfor

11:

ωc←\(Nc\+τ\)−β\|𝒴t\|−1​∑c′∈𝒴t\(Nc′\+τ\)−β,∀c∈𝒴t\\displaystyle\\omega\_\{c\}\\leftarrow\\frac\{\(N\_\{c\}\+\\tau\)^\{\-\\beta\}\}\{\|\\mathcal\{Y\}\_\{t\}\|^\{\-1\}\\sum\_\{c^\{\\prime\}\\in\\mathcal\{Y\}\_\{t\}\}\(N\_\{c^\{\\prime\}\}\+\\tau\)^\{\-\\beta\}\},\\quad\\forall c\\in\\mathcal\{Y\}\_\{t\}\.

12:

A¯←∑c∈𝒴tωc​Ac,B¯←\[ωc​qc\]c∈𝒴t\\displaystyle\\bar\{A\}\\leftarrow\\sum\_\{c\\in\\mathcal\{Y\}\_\{t\}\}\\omega\_\{c\}A\_\{c\},\\quad\\bar\{B\}\\leftarrow\[\\omega\_\{c\}q\_\{c\}\]\_\{c\\in\\mathcal\{Y\}\_\{t\}\}\.

13:

W←\(A¯\+λ​Idϕ\)−1​B¯\\displaystyle W\\leftarrow\(\\bar\{A\}\+\\lambda I\_\{d\_\{\\phi\}\}\)^\{\-1\}\\bar\{B\}\.

14:endfor

15:return

WW

## Experimental Setting

Task Protocol\.To simulate the continual emergence of generators in real\-world scenarios, we adopt a class\-incremental MGT attribution benchmark targeting11Human source and55LLMs\. We evaluate our method under33incremental protocols: P3, P4, and P5, which begin with 3, 4, and 5 initial classes, respectively\. While P5 follows the standard single\-step incremental setting of prior work\[DBLP:conf/kdd/LiuZL0ZWGT00025\], P3 and P4 are introduced to further increase task difficulty\. By starting with smaller base sets and introducing one new class per subsequent step, P3 and P4 involve more incremental stages, thereby imposing substantially stricter requirements on the model’s resistance to catastrophic forgetting\.

Dataset\.We conduct experiments on two representative benchmark datasets, which correspond to two major real\-world scenarios where LLM misuse poses particularly significant risks: rigorous academic writing and diverse social media interaction\. \(1\) MGT\-Academic\[DBLP:conf/kdd/LiuZL0ZWGT00025\]: This dataset focuses on the academic writing domain and covers three disciplines, namely STEM, Humanities, and Social Science\. It includes human\-authored text as well as text generated by five mainstream models, namely GPT\-3\.5111https://openai\.com/index/chatgpt\., GPT\-4o\-mini\[hurst2024gpt\], Moonshot\[moonshot2026official\], Mixtral\-8x7B\[jiang2024mixtral\], and Llama\-3\.1\[grattafiori2024llama\]\. We use the full dataset in our experiments, which contains approximately73​K73Ksamples in total\. \(2\) AIGTBench \(Social Media Subset\)\[DBLP:conf/acl/0001Z0ZL0Z025\]: The texts in this subset are drawn from major social media platforms\. It likewise contains human\-written text and text generated by GPT\-3\.5, GPT\-4o\-mini, Llama\-1\[touvron2023llama\], Llama\-2\[touvron2023llama2\], and Llama\-3\.1\. To ensure rigorous evaluation and class balance, we uniformly sample15​K15Kinstances from each class for our experiments\.

Baselines & Implementation\.We follow the experimental protocol of\[DBLP:conf/kdd/LiuZL0ZWGT00025\]and compare our method against several representative baselines\. We include the classical methods LwF\[li2017learning\], iCaRL\[rebuffi2017icarl\], and BiC\[wu2019large\], as well as the more recent EASE\[zhou2024expandable\], PASS\[zhu2021prototype\]and SimpleCIL\[zhou2025revisiting\]\. To ensure fair comparison, all replay\-based methods use the same memory buffer setting, with 100 stored samples for each old class\. As for the backbone, we adopt DeBERTa\-v3\-base\[he2021debertav3\]and RoBERTa\-base\[liu2019roberta\]as the text encoders for feature extraction\.

Evaluation Metrics\.All experiments are conducted under the same data splits, incremental protocols, and memory budget\. As the primary evaluation metric, we use macro\-F1\. To provide a more detailed analysis of model behavior, we report not only the overall macro\-F1 over all seen classes, but also separate scores on old classes and new classes\. This multi\-dimensional evaluation enables us to quantify how different methods balance two competing objectives: mitigating catastrophic forgetting on old classes and adapting to newly introduced ones\. Unless otherwise specified, we report the mean performance over three random seeds\.

Table 1:Performance Comparison across Domains and Models\.∗\\astdenotes methods using data replay\. “Ori\.” denotes the original base\-stage model before incremental updating, i\.e\., the S0 performance on the initial generator classes, and is reported only as a reference rather than as a continual\-learning baseline\. Darker color indicates better performance\. Bold values indicate the best result among continual\-learning methods within each backbone block\.![Refer to caption](https://arxiv.org/html/2606.05626v1/pic/trajectory_P3_P4_academic.png)Figure 3:Experiments on academic topics\. P3 starts with33initial classes and sequentially adds Moonshot, Llama\-3\.1, and GPT\-4o\-mini; P4 starts with44initial classes and sequentially adds Llama\-3\.1 and GPT\-4o\-mini\.
## Experiments

### Comparison with Baselines

We compareRidgeFTwith a wide range of baselines under33evaluation protocols\. Baselines marked with∗\\ast, such as LwF∗\\ast, iCaRL∗\\ast, BiC∗\\ast, and EASE∗\\ast, denote methods equipped with a replay mechanism that stores historical samples\.

As shown in[Figures˜3](https://arxiv.org/html/2606.05626#S4.F3)and[6](https://arxiv.org/html/2606.05626#A0.F6),RidgeFTconsistently outperforms all baselines across the P3 and P4 incremental stages in both domains\. In the academic setting, encompassing the STEM, Humanities, and Social Science datasets, under P3 \(sequentially adding Moonshot, Llama\-3\.1, and GPT\-4o\-mini\),RidgeFTmaintains strong old\-class F1 \(0\.913, 0\.870, 0\.861\) and new\-class F1 \(0\.814, 0\.729, 0\.799\) across the three stages\. Under P4, it achieves full\-F1 scores of 0\.862 and 0\.838, significantly outpacing the strongest baseline, which drops from 0\.797 to 0\.730\. This highlights that replay\-based baselines still struggle to balance old and new classes, while replay\-free baselines such as PASS and SimpleCIL are even more limited when generator classes arrive sequentially\. This robust advantage extends to the social media setting, whereRidgeFTsustains high full\-F1 scores across the P3 stages \(0\.945, 0\.868, 0\.855\) and P4 stages \(0\.878, 0\.867\)\. Notably, at the final social media P4 stage,RidgeFT’s new\-class F1 reaches 0\.899, substantially beating the strongest baseline \(0\.735\)\. These results confirm thatRidgeFTremains stable absorbing continuous generator arrivals without compromising past knowledge\.

This advantage is further confirmed under the P5 protocol \(refer to[Table˜1](https://arxiv.org/html/2606.05626#S4.T1)\)\. Averaged over the two backbones,RidgeFTachieves 0\.886 full\-F1, 0\.902 old\-class F1, and 0\.804 new\-class F1\. Compared with the strongest full\-F1 baseline, it improves full\-F1 by 0\.037\. Compared with the best baseline on new\-class F1, it improves new\-class F1 by 0\.107, while also outperforming the best old\-class baseline on old\-class F1\. This reveals a core limitation of baselines: they are better at preserving the decision boundaries of previously learned features, yet remain less effective at efficiently absorbing information from new classes\. In contrast,RidgeFTnot only performs better in retaining old classes, but also delivers a substantial improvement on the more critical task of recognizing new classes, surpassing the strongest baseline by 0\.107\.

![Refer to caption](https://arxiv.org/html/2606.05626v1/pic/ratio_sweep_P5_5_full_macro_f1.png)Figure 4:Full\-F1 under varying target\-class data proportions\.Finally, using GPT\-4o\-mini as the target generator \(STEM / DeBERTa\-base / P5\), we analyze the impact of target\-class data proportions\. As shown in[Figure˜4](https://arxiv.org/html/2606.05626#S5.F4), when reducing target\-class data from 100% to 5%,RidgeFT’s full\-F1 remains highly stable \(ranging from 0\.919 to 0\.908\)\. Conversely, the strongest replay baseline \(LwF∗\\ast\) averages only 0\.893, and the replay\-free SimpleCIL drops to 0\.831\. This confirms thatRidgeFTdoes not rely on abundant new\-class data, demonstrating superior data efficiency and robustness even under severe low\-resource incremental conditions\.

![Refer to caption](https://arxiv.org/html/2606.05626v1/pic/ablation_band.png)Figure 5:Parameter sensitivity ofRidgeFT\. We vary one hyperparameter at a time while keeping the others fixed, including the covariance calibration exponentδ\\delta, trace shrinkage coefficientα\\alpha, random feature dimensiondϕd\_\{\\phi\}, class\-reweighting strengthβ\\betaunder the 20% setting, smoothing constantτ\\tau, and ridge regularization coefficientλ\\lambda\.
### Ablation Studies

Table 2:Component ablation ofRidgeFTunder different new\-class data ratios\. W denotes whitening, RFL denotes random feature lift, and CBR denotes class\-balanced ridge\. “Full” denotes full macro\-F1 over all seen classes, while “New” denotes new\-class F1\.Component\-wise Ablation\.We ablate the33core components ofRidgeFTusing GPT\-4o\-mini as the target generator \(STEM / DeBERTa\-base / P5\)\. As shown in[Table˜2](https://arxiv.org/html/2606.05626#S5.T2), each component contributes distinctly, and they are highly complementary\. When Whitening is used alone, the model can stabilize encoder features to some extent, but its adaptability remains limited under low\-resource conditions\. For example, with only20%20\\%of the new\-class data, the new\-class F1 reaches only0\.7410\.741\. In contrast, Random Feature Lift consistently outperforms Whitening across all data ratios, indicating that random nonlinear features enhance the separability of different generators\. Class\-Balanced Ridge \(CBR\) is the strongest individual component\. Even with only20%20\\%of the data, it achieves a new\-class F1 of0\.8650\.865, suggesting that it plays a central role in mitigating the bias between old and new classes\. Notably, both CBR and the fullRidgeFTexhibit stability across data ratios ranging from20%20\\%to100%100\\%\. According to[Equation˜6](https://arxiv.org/html/2606.05626#S3.E6), whenβ=1\\beta=1andτ=0\\tau=0, the class weight satisfiesωc∝1/Nc\\omega\_\{c\}\\propto 1/N\_\{c\}, which exactly offsets the linear growth of class statistics \(Ac,qcA\_\{c\},q\_\{c\}\) with respect to the sample sizeNcN\_\{c\}\. Consequently, the ridge update effectively operates on class\-averaged statistics, rendering the model largely insensitive to the sampling ratio once stable estimation is reached\. Finally, combining all three components yields the best overall performance\. Their complementarity is particularly evident in the20%20\\%low\-resource setting, where the fullRidgeFTboosts the new\-class F1 from0\.8650\.865\(CBR alone\) to 0\.878\. Additional frozen\-representation analysis is provided in[Appendix˜A](https://arxiv.org/html/2606.05626#A1)\.

Parameter Sensitivity Ablation\.[Figure˜5](https://arxiv.org/html/2606.05626#S5.F5)presents the hyperparameter sensitivity ofRidgeFT\. Overall, it exhibits strong robustness to most hyperparameters\. For example, varying the smoothing constantτ\\tauand the ridge regularization coefficientλ\\lambdacauses almost no performance fluctuation, with Full\-F1 remaining consistently around0\.9190\.919\. From the perspective of feature stabilization, a moderate amount of trace shrinkage, withα∈\[0\.05,0\.10\]\\alpha\\in\[0\.05,0\.10\], is already sufficient to ensure reliable covariance estimation\. The covariance calibration exponentδ\\deltaperforms best in the range\[0\.375,0\.625\]\[0\.375,0\.625\], with Full\-F1 reaching0\.9190\.919atδ=0\.5\\delta=0\.5\. However, overly strong calibration, such asδ=1\.0\\delta=1\.0, instead suppresses useful discriminative information in the features and leads to a performance drop\. In terms of performance gains, the random feature dimensiondϕd\_\{\\phi\}and the class reweighting strengthβ\\betaplay the most important roles\. Asdϕd\_\{\\phi\}increases from512512to81928192, the higher\-dimensional feature space substantially improves the separability of new generators, steadily raising Full\-F1 and new\-class F1 to0\.92010\.9201and0\.8900\.890, respectively\. In addition, under the low\-resource setting with20%20\\%data, strong reweighting withβ=1\.0\\beta=1\.0boosts new\-class F1 by nearly 12 percentage points compared with the unweighted caseβ=0\\beta=0, increasing it from0\.7600\.760to0\.8780\.878\. This highlights its decisive role in alleviating the severe imbalance between old and new classes\.

Table 3:Sufficient\-statistic compression ablation \(default setting:dϕ=4096d\_\{\\phi\}=4096, 6 classes\)\. Per\-class schemes retain update flexibility, while merged schemes minimize storage by assuming future updates only add new classes, not new samples to old ones\.Sufficient\-Statistic Compression Ablation\.AlthoughRidgeFTis exemplar\-free, it is not memory\-free\. Its closed\-form update still requires storing the statistics needed for ridge regression: the calibration parameters, the fixed projection matrixRR, and the class\-wise sufficient statisticsAcA\_\{c\},qcq\_\{c\}, andNcN\_\{c\}\. The high\-dimensionalAc∈ℝdϕ×dϕA\_\{c\}\\in\\mathbb\{R\}^\{d\_\{\\phi\}\\times d\_\{\\phi\}\}dominates this storage cost\. Under the default setting \(dϕ=4096d\_\{\\phi\}=4096,66classes\), the fp64 implementation requires782\.35782\.35MiB in total, with the six class\-specificAcA\_\{c\}matrices alone accounting for768768MiB\.

To reduce this bottleneck, we examine compression along two orthogonal dimensions\. The first is numerical precision \(fp32 or bf16\), which preserves the full class\-wise statistic table\. This retains the flexibility to reweight classes post\-training or incorporate new samples into previously seen classes\. The second is class aggregation, which stores only the weighted merged matrixA¯=∑cωc​Ac\\bar\{A\}=\\sum\_\{c\}\\omega\_\{c\}A\_\{c\}\. This significantly reduces the memory footprint but folds the current class weights into the matrix, sacrificing the ability to update old classes individually\. Importantly, this limitation is fully compatible with our P3/P4/P5 protocols, where incremental stages only introduce new generator classes\.

As shown in[Table˜3](https://arxiv.org/html/2606.05626#S5.T3),RidgeFTis highly robust to low\-precision storage\. Reducing per\-classAcA\_\{c\}to fp32 or bf16 substantially cuts storage while keeping the Full\-F1 consistently above 0\.919\. Class aggregation yields even larger gains by removing redundant class\-specific matrices\. Among all variants, merged bf16 provides the optimal trade\-off: it requires only 46\.35 MiB \(a16\.9×16\.9\\timescompression ratio\) while achieving a Full\-F1 of 0\.920\. Overall,RidgeFTshould be viewed as replay\-free rather than storage\-free\. Merged bf16 is the most compact choice for our new\-class\-only protocols, whereas per\-class bf16 is preferable if future updates may add samples to old classes\.

## Conclusion

This paper addresses the challenge of lifelong MGT attribution: continuously learning new generators without suffering catastrophic forgetting or relying on historical text replay\. To overcome the dilemma between representation drift \(from fine\-tuning\) and memory overhead \(from exemplars\), we proposeRidgeFT\. By freezing a task\-tuned encoder and formulating incremental adaptation as closed\-form ridge regression, our replay\-free framework balances old and new classes without encoder parameter updates\. Extensive experiments confirm thatRidgeFTachieves a superior trade\-off against baselines, demonstrating that combining stable representations with analytic updates offers a scalable path for future MGT attribution systems\.

## Limitations

AlthoughRidgeFTachieves strong performance in lifelong MGT attribution, it still has several limitations\.

- •Storage Requirements and Compression:RidgeFTis replay\-free rather than completely storage\-free\. In other words, although it does not require storing raw text samples from old classes, its closed\-form ridge updates still rely on storing the covariance calibration parameters, the fixed random projection matrix, and the class\-level sufficient statistics\. While our compression study shows that the storage cost can be substantially reduced through low\-precision storage and statistical merging, how to further compress these statistical quantities remains an important direction for future research\.
- •Multilingual Extension:Our experiments are mainly conducted on English datasets\. SinceRidgeFToperates in the encoder representation space rather than relying on language\-specific surface\-level textual features, we believe that it has the potential to extend to multilingual MGT attribution\. A natural direction for future work is to combineRidgeFTwith multilingual encoders and systematically evaluate its effectiveness across more languages and cross\-lingual generator settings\.
- •Long\-term Deployment and Representation Stability:BecauseRidgeFTdoes not continue updating the deep encoder during the incremental stage, but instead relies on a fixed representation space, a fixed random feature mapping, and closed\-form ridge updates, it should not be interpreted as a method that can absorb an unlimited number of new generators over an arbitrarily long time horizon without performance degradation\. As future generators become increasingly different from those seen during the initial training stage, the frozen representation may contain insufficient generator\-specific information, which can weaken the effectiveness of later incremental updates\. Therefore, in longer\-term deployment scenarios, an important open question is how to introduce periodic encoder adaptation or calibration refresh while preserving representation stability\.

## Ethical Considerations

This work focuses on lifelong machine\-generated text attribution under benchmark settings\. It does not collect private user data, involve human subjects, or introduce new generative or attack capabilities\. The main potential risk lies in the deployment of attribution systems: imperfect predictions may cause incorrect judgments about the source of a text\. Therefore, we suggest that MGT attribution models should be used as auxiliary evidence rather than as the sole basis for high\-stakes decisions\. Real\-world deployment should further consider domain shift, transparency, and human oversight\.

## References

![Refer to caption](https://arxiv.org/html/2606.05626v1/pic/trajectory_P3_P4_social_media.png)Figure 6:Lifelong MGT attribution experiments on the social media topic\. P3 starts with three initial classes and sequentially adds Llama\-2, GPT\-4o\-mini, and Llama\-3\.1; P4 starts with four initial classes and sequentially adds GPT\-4o\-mini and Llama\-3\.1\.## Appendix ASufficiency Analysis of Frozen Representations

SinceRidgeFTfreezes the task\-tuned encoder during the incremental stage, a natural question arises: do frozen representations still preserve discriminative information for future generators? To answer this question, we compare three feature spaces derived from the same frozen encoder: the raw representationh=fθ​\(x\)h=f\_\{\\theta\}\(x\), the covariance\-calibrated representationh~=Pδ​\(h−μ\)\\tilde\{h\}=P\_\{\\delta\}\(h\-\\mu\), and the random feature representationz=LN​\(ReLU​\(R​h~\)\)z=\\mathrm\{LN\}\(\\mathrm\{ReLU\}\(R\\tilde\{h\}\)\)\. We diagnose these feature spaces using nearest\-centroid probes and ridge probes, where none of the methods updates the encoder parameters\.

![Refer to caption](https://arxiv.org/html/2606.05626v1/pic/tsne_feature_spaces.png)Figure 7:t\-SNE visualization of different frozen feature spaces\. We compare the raw frozen representationhh, the calibrated representationh~\\tilde\{h\}, and the random feature representationzz\. The figure shows that the newly introduced generator does not completely collapse into the old classes, and that the feature transformations inRidgeFTmake the class structure more clearly separated \(STEM / DeBERTa\-base / P5 setting\)\.Table 4:Sufficiency analysis of frozen representations\. All methods use the same frozen encoder\. Ridge onhhalready achieves a relatively strong new\-class F1, suggesting that the frozen representation retains discriminative information for future generators\.RidgeFTfurther improves performance, indicating that covariance calibration and random feature mapping can exploit this information more effectively\.This analysis is conducted under the P5 protocol, where the model is first trained on five initial classes and GPT\-4o\-mini is introduced as the final new generator\. As shown in[Table˜4](https://arxiv.org/html/2606.05626#A1.T4), applying a ridge probe directly on the raw frozen representationhhalready yields a new\-class F1 of0\.8440\.844, which suggests that the frozen encoder is not limited to the initial classes and still preserves useful discriminative cues for subsequent generators\. Building on this,RidgeFTfurther improves the new\-class F1 to0\.8600\.860, while also maintaining stable old\-class F1\. On the other hand, the nearest\-centroid probe improves from0\.4950\.495onhhto0\.7300\.730onzz, indicating that the feature transformations inRidgeFTmake the class structure more suitable for statistic\-based analytic classification\.

It should be noted that this analysis does not imply that a frozen encoder is always sufficient for arbitrary future generators\. The random feature mapping can only enhance the usability of the information already present in the frozen representation, but it cannot recover generator\-specific information that is entirely absent from the encoder\. Therefore,RidgeFTis better suited to scenarios where the frozen representation still provides at least partial coverage of future generators\. For generators that are extremely out of distribution, periodic adaptation of the encoder may still be necessary\.

## Appendix BArtifact Licenses, Intended Use, and Data Safety

We use two publicly released benchmark datasets, MGT\-Academic\[DBLP:conf/kdd/LiuZL0ZWGT00025\]and AIGTBench\[DBLP:conf/acl/0001Z0ZL0Z025\], and do not collect any new user data\. MGT\-Academic is released under the MIT license, and AIGTBench is released under the Apache 2\.0 license\. Both datasets are released for research on machine\-generated or AI\-generated text detection/attribution, and our use is consistent with their intended research purposes\. We cite the original datasets and papers, and we do not redistribute the raw datasets as part of this work\.

For privacy and content safety, we use the processed benchmark versions released by the original authors\. These datasets have already undergone preprocessing and moderation by their respective creators\. Our experiments only use text content and class labels for aggregate model training and evaluation, and we do not analyze, expose, or redistribute any individual\-level information\.

## Appendix CSoftware Implementations and Package Settings

All experiments were conducted in a Linux environment with kernel version 5\.4\.0\-174\-generic and Python 3\.12\.2\. We used PyTorch 2\.10\.0\+cu128 with CUDA 12\.8 on an NVIDIA L20 GPU with driver version 550\.54\.15\. Transformer encoders and tokenizers were implemented with HuggingFace Transformers 4\.57\.6, using DeBERTa\-v3\-base and RoBERTa\-base as the main backbones\. Encoder fine\-tuning followed a unified configuration with a maximum sequence length of 256, batch size 16, 3 training epochs, AdamW optimization, weight decay, and gradient clipping\.

Traditional machine learning classifiers and evaluation metrics were implemented with scikit\-learn 1\.7\.2\. Numerical computation relied on NumPy 2\.0\.1 and SciPy 1\.15\.2, while result aggregation and visualization used Pandas 2\.2\.3 and Matplotlib 3\.9\.1\. ForRidgeFT, we fixed the main hyperparameters across all primary experiments: fractional whitening exponentδ=0\.5\\delta=0\.5, shrinkage coefficientα=0\.05\\alpha=0\.05, random feature dimensiondϕ=4096d\_\{\\phi\}=4096, ridge regularization coefficientλ=1\.0\\lambda=1\.0, and class\-balanced ridge coefficientβ=1\.0\\beta=1\.0\. Each experiment was run three times, and we report the average results\.

## Appendix DComputational Experiments

We report the computational cost of different update strategies on a single NVIDIA L20 GPU\. Under the P5 protocol with DeBERTa\-v3\-base,RidgeFTupdates the classifier by recomputing sufficient statistics and solving a closed\-form ridge regression problem, which only takes a few seconds for the newly added generator\. In contrast, the joint\-from\-scratch baseline retrains an 184M\-parameter DeBERTa\-v3\-base model on all old and new training texts, involving about 94k samples, and takes approximately 50 minutes on a single GPU\. The initial five\-class encoder training is shared across methods and takes 1,738 seconds\.

These results show thatRidgeFTachieves much lower incremental update cost than full retraining, while avoiding access to old raw texts during the update stage\. This efficiency advantage is central to our lifelong MGT attribution setting, where attribution systems must incorporate newly emerging generators without repeatedly retraining the entire model\.

## Appendix EUse of AI Assistants

We used AI assistants only for language polishing, grammar checking, and improving the readability of the manuscript\. All research ideas, experimental designs, implementation decisions, experimental results, analyses, and final claims were developed, checked, and approved by the authors\. The AI assistants were not used to independently generate, validate, or interpret experimental results, and they are not listed as authors\.

Table 5:New\-class F1 across Domains and Models \(final continual step\)\.∗\\astdenotes methods using data replay\. Darker color indicates better performance\. Bold values are the highest among continual\-learning methods of the RoBERTa and DeBERTa blocks in each row\. The “Ori\.” column is marked “—” because no new class has been introduced at S0\.Table 6:Old\-class F1 across domains and models \(final incremental step, initial\-class subset\)\.∗\\astdenotes methods using data replay\. Darker color indicates better performance\. Bold values indicate the best result among continual\-learning baselines within each RoBERTa\-base and DeBERTa\-base block\. The “Ori\.” column reports the S0 macro\-F1 on the initial classes, which is identical to full\-F1 because all classes are old at S0\.

Similar Articles

The Attribution Contract: Feature Attribution for Generative Language Models

arXiv cs.LG

This paper introduces the Attribution Contract, a specification for feature-attribution claims in generative language models, addressing ambiguities in what constitutes a feature and how attribution methods should be evaluated. It uses autoregressive and diffusion models as case studies to show when attribution is informative or misleading.