MGI: Member vs Generated Inference

arXiv cs.LG 06/24/26, 04:00 AM Papers
Summary
Introduces the Member vs Generated Inference (MGI) task to distinguish training members from generated outputs in generative models, and proposes Data Circuit Breaker (DCB), a three-stage method combining autoencoder and latent generator signals, which outperforms existing methods across autoregressive and diffusion models.
arXiv:2606.23872v1 Announce Type: new Abstract: As generative models increasingly produce samples that are indistinguishable from human-created content, it becomes difficult to determine whether a given data point was part of a model's natural training set or was generated by the model itself, especially when models memorize and reproduce training data. We formalize this challenge as Member vs Generated Inference (MGI): given a sample and a target generative model, infer whether the sample is a true training member or a generated output of that model. Focusing on image generation, we show that existing membership inference methods systematically misclassify generated samples as training members, while attribution-based methods often misclassify true members as generated. This failure arises because both approaches rely on likelihood-related signals that are similarly elevated for training examples and for the model's own outputs. To address MGI, we propose Data Circuit Breaker (DCB), a three-stage method that combines complementary signals from a generative model's autoencoder and latent generator to distinguish training members from generated samples. Across multiple generative models, including image autoregressive and diffusion models, DCB consistently addresses the shortcomings of membership inference and attribution methods, remains effective even when models reproduce near-duplicates of training samples, and generalizes to challenging model derivative settings in which new models are trained on generated data.
Original Article
View Cached Full Text
Cached at: 06/24/26, 07:49 AM
# MGI: Member vs Generated Inference
Source: [https://arxiv.org/html/2606.23872](https://arxiv.org/html/2606.23872)
Bihe Zhao, Michel Meintz, Juangui Xu, Franziska Boenisch, Adam Dziedzic \{bihe\.zhao, michel\.meintz, juangui\.xu, boenisch, adam\.dziedzic\}@cispa\.de CISPA Helmholtz Center for Information Security

###### Abstract

As generative models increasingly produce samples that are indistinguishable from human\-created content, it becomes difficult to determine whether a given data point was part of a model’s natural training set or was generated by the model itself, especially when models memorize and reproduce training data\. We formalize this challenge asMember vs Generated Inference\(MGI\): given a sample and a target generative model, infer whether the sample is a true training member or a generated output of that model\. Focusing on image generation, we show that existing membership inference methods systematically misclassify generated samples as training members, while attribution\-based methods often misclassify true members as generated\. This failure arises because both approaches rely on likelihood\-related signals that are similarly elevated for training examples and for the model’s own outputs\. To address MGI, we proposeData Circuit Breaker\(DCB\), a three\-stage method that combines complementary signals from a generative model’s autoencoder and latent generator to distinguish training members from generated samples\. Across multiple generative models, including image autoregressive and diffusion models, DCB consistently addresses the shortcomings of membership inference and attribution methods, remains effective even when models reproduce near\-duplicates of training samples, and generalizes to challengingmodel derivativesettings in which new models are trained on generated data\.

## 1Introduction

![Refer to caption](https://arxiv.org/html/2606.23872v1/x1.png)Figure 1:Overview of the new Member vs Generated Inference \(MGI\) task\.The core challenge is separating genuine training membership from model generation, even across chains of models trained on generated data\. LetN=NM∪NNN=N\_\{M\}\\cup N\_\{N\}denote a natural dataset, whereNM∩NN=∅N\_\{M\}\\cap N\_\{N\}=\\varnothing\. A generative modelℳ1\\mathcal\{M\}\_\{1\}is trained on the member setNMN\_\{M\}, whileNNN\_\{N\}is held out as natural non\-member data\. After training,ℳ1\\mathcal\{M\}\_\{1\}produces a generated datasetG=GM∪GNG=G\_\{M\}\\cup G\_\{N\}, withGM∩GN=∅G\_\{M\}\\cap G\_\{N\}=\\varnothing\. Here,GMG\_\{M\}andGNG\_\{N\}are both generated byℳ1\\mathcal\{M\}\_\{1\}and therefore follow the same generated\-data distribution, but they play different roles in downstream settings:GMG\_\{M\}is used to train a new modelℳ2\\mathcal\{M\}\_\{2\}, whereasGNG\_\{N\}is withheld and serves as generated non\-member data forℳ2\\mathcal\{M\}\_\{2\}\. The new modelℳ2\\mathcal\{M\}\_\{2\}is thus trained on generated membersGMG\_\{M\}rather than natural membersNMN\_\{M\}\. The new modelℳ2\\mathcal\{M\}\_\{2\}in turn generates a new datasetG′=GM′∪GN′G^\{\\prime\}=G^\{\\prime\}\_\{M\}\\cup G^\{\\prime\}\_\{N\}, whereGM′∩GN′=∅G^\{\\prime\}\_\{M\}\\cap G^\{\\prime\}\_\{N\}=\\varnothing; samples inGM′G^\{\\prime\}\_\{M\}may be used to train further downstream models such asℳ3\\mathcal\{M\}\_\{3\}, whileGN′G^\{\\prime\}\_\{N\}remains withheld\. Under this setup, MGI asks whether a given sample should be attributed to training data or to model\-generated data\. For the original modelℳ1\\mathcal\{M\}\_\{1\}, the task is to distinguish amongNM,NN,G\{N\_\{M\},N\_\{N\},G\}, separating true natural training membersNMN\_\{M\}from natural non\-membersNNN\_\{N\}\(as in the canonical membership inference task\) and from model’sℳ1\\mathcal\{M\}\_\{1\}generated samplesGG\. For the derivative modelℳ2\\mathcal\{M\}\_\{2\}, the task becomes: distinguish amongGM,GN,G′\{G\_\{M\},G\_\{N\},G^\{\\prime\}\}, separating generated training membersGMG\_\{M\}from both generated non\-membersGNG\_\{N\}and from model’sℳ2\\mathcal\{M\}\_\{2\}generated samplesG′G^\{\\prime\}\. We can further incorporate the natural samplesNNasℳ2\\mathcal\{M\}\_\{2\}’s non\-member data, however, theGNG\_\{N\}represents the most difficult case of the non\-member data\.Generative models are now trained on massive internet data and generate high\-quality samples at an unprecedented speed\. These models also inadvertently memorize some of their individual training inputs and later recreate them as outputs\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[3](https://arxiv.org/html/2606.23872#bib.bib108)\]\. The fact that the outputs from generative models are indistinguishable from real data blurs theboundary between a model’s training and generated data\. We formalize this challenge as the Member vs Generated inference \(MGI\) task: given an image and a target generative model, decide whether the sample is a true training member of that model or a generated output by the same model\. We illustrate the MGI task in the overview[Figure1](https://arxiv.org/html/2606.23872#S1.F1)for adirect trainingand amodel derivativesetting\. In thedirect trainingsetting with modelℳ1\\mathcal\{M\}\_\{1\}, the goal is to distinguish natural training membersNMN\_\{M\}from imagesGGgenerated by the model\. Even in this seemingly simple setting, MGI is fundamentally harder than standard membership inference: generated images are optimized under the same latent distribution as training members, causing their likelihood\-based scores to overlap heavily, as we demonstrate in[Section4](https://arxiv.org/html/2606.23872#S4)\. We further explore a more challenging and practically relevantmodel derivativesetting, where the samples generated byℳ1\\mathcal\{M\}\_\{1\}are \(potentially published online, then scraped from the internet, and\) used to train the subsequent model versionℳ2\\mathcal\{M\}\_\{2\}\. In this regime, members are no longer purely natural samples, and simply separating natural from generated content is insufficient\. Both membership inference and attribution methods degrade further in theℳ2\\mathcal\{M\}\_\{2\}setting, where generated training data introduces compounding ambiguity between membership and generation signals\.

Focusing on image generation, we firt show that existing membership inference methods\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[29](https://arxiv.org/html/2606.23872#bib.bib129),[34](https://arxiv.org/html/2606.23872#bib.bib110)\]are inadequate for MGI: they are designed to separate training members from held\-out natural data, and consequently tend to incorrectly label model\-generated \(but non\-member\) samples as members\. Conversely, attribution methods that aim to determine whether a sample was generated by a particular generative model\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]are also insufficient, often failing by labeling training members as generated\. Both failures stem from the same underlying cause: the outputs for the new powerful generative models are derived directly from the training samples of generative models themselves\. As a result, signals based on likelihood or output probabilities are similarly high for both true members and the models’ own outputs, breaking the assumptions underlying prior methods\.

To address the MGI challenge for modern image generative models, we propose a new methodData Circuit Breaker\(DCB\)\.111A circuit breaker is an electrical safety device designed to protect an electrical circuit from damage caused by current in excess of that which the equipment can handle\. In our case, DCB can protect new models, for example, from degrading in performance by preventing their training on significant amounts of their own generated data\.Our DCB method treats the generation pipeline holistically rather than focusing solely on the latent generator\. The key insight is that while the latent generator produces high scores for members and generated samples, the autoencoder introduces measurable artifacts: generated samples, having passed through the full encode\-decode pipeline, exhibit lower reconstruction and quantization errors than natural data points under the autoencoder\. DCB exploits this by proceeding in three stages: \(1\) an autoencoder\-based filtering step that identifies generated samples, separating them from non\-generated data points; \(2\) a membership inference step on the non\-generated samples using the latent generator, where the standard assumption that members score is restored; and \(3\) a cross\-generator attribution step that compares conditional log\-probabilities across multiple model versions to distinguish among the generated samples from different generators\. Together, these stages enable DCB to solve MGI even in the most difficult cases of training data memorization\.

Overall, our contributions are as follows:

1. 1\.New task\.We introduce*Member\-vs\-Generated Inference*\(MGI\) task, which asks whether a given sample is a true training member of a generative model or an output example generated by that same model\.
2. 2\.Limits of prior work\.We demonstrate that existing approaches are insufficient for MGI: Membership inference methods systematically misclassify generated samples as members, while attribution methods often incorrectly label training members as generated\.
3. 3\.Method\.We propose DCB \(Data Circuit Breaker\), a three\-stage procedure that exploits autoencoder self\-consistency to filter generated samples, latent\-generator scores for membership inference, and cross\-generator probability discrepancies to trace data circuits across model versions\.
4. 4\.Memorization robustness\.We show that DCB remains effective even under verbatim memorization, distinguishing original training samples from their regurgitated \(near\-duplicate\) generated counterparts\.

## 2Background and Related Work

Image Generative Models \(IGMs\)\.The dominant families of modern image generative models \(IGMs\) are*diffusion models*\(DMs\) and*image autoregressive models*\(IARs\)\. Many state\-of\-the\-art IGMs in both families generate images in a*latent space*: an encoder first maps a high\-resolution image from pixel space to a latent representation, and a decoder maps the synthesized latent back to pixels\. While they share the latent\-generation pipeline, DMs and IARs differ fundamentally in how they represent and sample from the data distribution\. DMs define an*implicit*generative process via iterative denoising, whereas IARs*explicitly*factorize likelihood by predicting token probabilities sequentially, similarly to large language models \(LLMs\)\.

Diffusion Models \(DMs\)\.DMs synthesize images by transforming Gaussian noise into a structured sample through a learned denoising procedure\[[21](https://arxiv.org/html/2606.23872#bib.bib137),[8](https://arxiv.org/html/2606.23872#bib.bib138)\]\. Generation starts fromxT∼𝒩\(𝟎,𝐈\)x\_\{T\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)and proceeds forTTsteps, iteratively predicting noiseϵ𝒢\(𝐱t,t,𝐜\)\\epsilon\_\{\\mathcal\{G\}\}\(\{\\mathbf\{x\}\}\_\{t\},t,\{\\mathbf\{c\}\}\)fort=T,…,1t=T,\\ldots,1, and then removing it\. In conditional settings \(e\.g\., class\-to\-image or text\-to\-image\), the denoiser is conditioned on auxiliary inputs𝐜\{\\mathbf\{c\}\}, typically text embeddings produced by pretrained encoders such as CLIP\[[14](https://arxiv.org/html/2606.23872#bib.bib141)\]\. Conditioning is injected through cross\-attention layers\[[25](https://arxiv.org/html/2606.23872#bib.bib142)\]\.

Image Autoregressive Models \(IARs\)\.IARs generate images by predicting discrete latent tokens one\-by\-one using next\-token\-based autoregressive model, directly modeling a factorized distribution over the latent sequence\. A typical IAR consists of \(1\) a vector\-quantized VAE \(VQ\-VAE\) that encodes an image into discrete representations from a codebook, and \(2\) an autoregressive transformer that models the codebook representations as tokens and samples them sequentially\. For example, LlamaGen\[[22](https://arxiv.org/html/2606.23872#bib.bib135)\]uses a VQ\-based autoencoder to produce quantized features, then applies a Llama\-style transformer to generate tokens autoregressively\. VAR further introduces a multi\-scale VQ representation to enable coarse\-to\-fine synthesis\[[23](https://arxiv.org/html/2606.23872#bib.bib144)\]\. Randomized autoregressive models \(RARs\) generalize next\-token prediction by training with randomized token orderings and an annealing\-based procedure\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.

Membership Inference Attack \(MIA\)\.MIA aims to determine whether a given data point was part of a model’s training set or not\[[19](https://arxiv.org/html/2606.23872#bib.bib148),[18](https://arxiv.org/html/2606.23872#bib.bib149)\]\. MIA methods are used for auditing models’ privacy leakage and verifying empirically the differential privacy guarantees\[[12](https://arxiv.org/html/2606.23872#bib.bib150),[17](https://arxiv.org/html/2606.23872#bib.bib151)\]\. Recent work on MIA against IGMs\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[29](https://arxiv.org/html/2606.23872#bib.bib129),[34](https://arxiv.org/html/2606.23872#bib.bib110)\]shows that comparing an image’s conditional generation to its unconditional generation provides an effective signal for deciding whether the model was trained on that image \(member\) or not \(non\-member\)\. Thus, the attack considers only the problem of differentiating between the train vs test samples and does not consider the data generated by the target IGMs\. The signal in MIA can be improved by leveraging shadow models, that are trained on data from the same distribution\. LiRA\[[2](https://arxiv.org/html/2606.23872#bib.bib12)\]uses the shadow models to estimate the sample’s loss distribution for members and non\-members, while RMIA\[[33](https://arxiv.org/html/2606.23872#bib.bib115)\]compares the likelihood ratio of the target sample with those of reference population samples\.

Image Attribution Methods\.In contrast to MIAs, image attribution methods seek to identify whether a given image wasgeneratedby a model or not, which is critical for tracing generated content and preventing data circuits that lead to model collapse\[[1](https://arxiv.org/html/2606.23872#bib.bib132),[20](https://arxiv.org/html/2606.23872#bib.bib131)\]\. Analogously to MIAs for image autoregressive models\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[29](https://arxiv.org/html/2606.23872#bib.bib129)\], PRADA\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]shows that the probability ratio can also carry information about whether an image is generated, i\.e\., a member of the model’s learned distribution, or not generated by the target model\. However, the evaluation of the PRADA method is limited to distinguishing generated samples from held\-out test samples, which is substantially easier than our newly defined setting: differentiating generated outputs from member training samples\. Additionally, PRADA considers only IARs and relies exclusively on the per\-token probabilities returned by the image latent generator\. As a result, it does not exploit informative signals available in the models’ autoencoders, such as the quantization loss between generated and natural \(e\.g\., train or test\) samples\[[35](https://arxiv.org/html/2606.23872#bib.bib153)\], leaving part of the membership\-related information available in IGMs unexploited\.

Data Memorization\.Memorization describes the extent to which a model retains information from its training data\. It can beunintended, when the model stores details about individual examples that can later be reproduced or extracted\[[3](https://arxiv.org/html/2606.23872#bib.bib108),[11](https://arxiv.org/html/2606.23872#bib.bib128)\]\. Theintendedmemorization occurs when the model encodes general, reusable patterns that support generalization\[[7](https://arxiv.org/html/2606.23872#bib.bib106),[27](https://arxiv.org/html/2606.23872#bib.bib124)\]\. For data provenance, the most challenging setting arises when a generative model memorizes training images*verbatim*and subsequently regurgitates them during generation, as was shown for DMs\[[3](https://arxiv.org/html/2606.23872#bib.bib108)\]and IARs\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\], effectively collapsing the distinction between genuine training images and model\-generated outputs\. We show that our approach remains effective even in this extreme regime: despite near\-duplicate visual content, IGM samples retain subtle generation\-specific residuals that are imperceptible to humans yet detectable in the IGMs’ latent representations, enabling reliable discrimination between natural training images and generated images\.

## 3Member vs Generated Inference \(MGI\)

In this section, we formulate MGI and its threat model under both the direct training and the model derivative settings\.

Threat Model\.The threat model of MGI initially follows that of membership inference and attribution tasks\. Given a set of data pointsDDand access to a generative modelℳ\\mathcal\{M\}, the goal is to differentiate the subset of member samplesDMD\_\{M\}used for training of the modelℳ\\mathcal\{M\}, its generated samplesDGD\_\{G\}, and non\-member samplesDND\_\{N\}, that were neither used for training, nor generated by the model\. We formalize the task for bothdirect trainingandmodel derivativesetting as follows\.

Direct Training\.The generative modelℳ1\\mathcal\{M\}\_\{1\}was trained on a natural datasetNMN\_\{M\}, which we refer to as the natural members\. Similar to the conventional MIA setting, there is a natural dataset that was not used for trainingNNN\_\{N\}, which is the natural non\-members\. Under MGI, we also consider the generated dataGG, which was produced byℳ1\\mathcal\{M\}\_\{1\}\. The goal of MGI is to distinguish between the generated samplesGGand membersNMN\_\{M\}\. In[Section4](https://arxiv.org/html/2606.23872#S4)we show that existing MIAs fail under this new task\.

Model Derivatives\.Given the continuous development of generative models and the ubiquity of the generated content, we also consider the relevant scenario of model derivatives\. While the samplesNMN\_\{M\}were used to train the initial model versionℳ1\\mathcal\{M\}\_\{1\}, its generated samplesGGmay end up being used to train a new modelℳ2\\mathcal\{M\}\_\{2\}, resulting in the setGMG\_\{M\}, the generated members ofℳ2\\mathcal\{M\}\_\{2\}and jointlyGNG\_\{N\}the generated non\-members\. The second model versionℳ2\\mathcal\{M\}\_\{2\}produces new samplesG′G^\{\\prime\}that may end up in further model generations\. This iterative training results in data circuits, where new models are derived directly from the previous ones, which can lead to model collapse\[[20](https://arxiv.org/html/2606.23872#bib.bib131),[1](https://arxiv.org/html/2606.23872#bib.bib132)\]\. Under the MGI setting we assume access to bothℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}and the goal is to distinguishNMN\_\{M\}vsGMG\_\{M\}vsG′G^\{\\prime\}\.

Model Composition\.A generative modelℳ\{\\mathcal\{M\}\}consists of an autoencoder𝒜=𝒟∘ℰ\{\\mathcal\{A\}\}=\{\\mathcal\{D\}\}\\circ\{\\mathcal\{E\}\}, pairing an encoderℰ\{\\mathcal\{E\}\}with a decoder𝒟\{\\mathcal\{D\}\}, and a latent generator𝒢\{\\mathcal\{G\}\}\.ℳ\{\\mathcal\{M\}\}can thus be defined as a triplet composition ofℰ,𝒟,\{\\mathcal\{E\}\},\{\\mathcal\{D\}\},and𝒢\{\\mathcal\{G\}\}:ℳ=⟨ℰ,𝒟,𝒢⟩\.\{\\mathcal\{M\}\}=\\langle\{\\mathcal\{E\}\},\{\\mathcal\{D\}\},\{\\mathcal\{G\}\}\\rangle\.

## 4Limitations of MIA and Attribution Methods

![Refer to caption](https://arxiv.org/html/2606.23872v1/x2.png)\(a\)The distribution of scores for the state\-of\-the\-art MIA on IARs\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x3.png)\(b\)The distribution of scores for a IAR\-generated image attribution\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]\.

Figure 2:Distributions of scores for membership inference attack and image attribution on IARs\.In all cases, the differentiation between training \(Train\) and generated \(Belonging\) images is more difficult than between training \(Train\) and validation \(Val\) images\. This indicates more difficult cases of the MGI \(Member vs Generated Inference\) than MI task\. The evaluated model is VAR\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\]\.MIAs for IGMs\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[29](https://arxiv.org/html/2606.23872#bib.bib129),[34](https://arxiv.org/html/2606.23872#bib.bib110)\]are formulated for the classical setting of distinguishing*natural*training members from*natural*non\-members, and they rely almost exclusively on likelihood\-based or probability\-based signals from the*latent generator*\. A representative family of methods score an input image𝐱\{\\mathbf\{x\}\}, with conditioning𝐜\{\\mathbf\{c\}\}, e\.g\. a class label or a prompt, via a*conditional probability discrepancy*\(CPD\):

Δ\(ℳ,𝐱,𝐜\)=log⁡Pℳ\(𝐱∣𝐜\)−log⁡Pℳ\(𝐱\)≈log⁡P𝒢\(ℰ\(𝐱\)∣𝐜\)−log⁡P𝒢\(ℰ\(𝐱\)\),\\displaystyle\\begin\{split\}\\Delta\(\{\\mathcal\{M\}\},\{\\mathbf\{x\}\},\{\\mathbf\{c\}\}\)&=\\log P\_\{\{\\mathcal\{M\}\}\}\(\{\\mathbf\{x\}\}\\mid\{\\mathbf\{c\}\}\)\-\\log P\_\{\{\\mathcal\{M\}\}\}\(\{\\mathbf\{x\}\}\)\\\\ &\\approx\\log P\_\{\{\\mathcal\{G\}\}\}\(\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\\mid\{\\mathbf\{c\}\}\)\-\\log P\_\{\{\\mathcal\{G\}\}\}\(\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\),\\end\{split\}\(1\)
where the approximation reflects the standard IGM decomposition into an encoderℰ\{\\mathcal\{E\}\}\(mapping pixels to latents\) and a latent generative model𝒢\{\\mathcal\{G\}\}\(assigning probabilities in latent space\)\. The decision rule is obtained by thresholdingΔ\\Delta, where members are expected to exhibit systematically larger discrepancies than non\-members, as the modelremembersthese samples\.

This design implicitly treats the autoencoder𝒜\{\\mathcal\{A\}\}as a transparent part and largely discards signals that arise from the pixel\-to\-latent and latent\-to\-pixel mapping itself\. However, the autoencoder𝒜\{\\mathcal\{A\}\}is a core component of modern IGMs: the encoderℰ\{\\mathcal\{E\}\}\(typically CNN\-based\) maps an image𝐱∈ℝH×W×3\{\\mathbf\{x\}\}\\in\\mathbb\{R\}^\{H\\times W\\times 3\}to a latent feature mapf∈ℝHp×Wp×Cf\\in\\mathbb\{R\}^\{\\frac\{H\}\{p\}\\times\\frac\{W\}\{p\}\\times C\}via down\-sampling by a factorpp, and the corresponding decoder𝒟\{\\mathcal\{D\}\}reconstructs𝐱\{\\mathbf\{x\}\}fromff\. As we show later, these components encode artifacts that are not captured in the likelihood\-only or probability\-only tests on the latent generator𝒢\{\\mathcal\{G\}\}\.

In this paper we consider SOTA MIAs for DMs and IARs, that leverage the CPD\. CLiD\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\], is an MIA for DMs, which approximates the CPD, by leveraging the noise prediction loss to compute the Evidence Lower Bound \(ELBO\) of the log\-likelihood\. For IARs we use the MIA from\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\], which is computed on the model probabilities directly and call this methodPIARin the following\. Additionally we use ICAS\[[29](https://arxiv.org/html/2606.23872#bib.bib129)\], which considers the classifier\-free guidance as an implicit classifier and approximatesp\(c\|x\)p\(c\|x\)further weighting this probability to obtain a final score\. Furthermore we use PRADA\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\], which while proposed for image attribution, has on a high\-level, conceptual similarities to MIAs, as both leverage that models that haveseenthe data, be it during training or generation, have a higher likelihood on that image\. PRADA first computes a balanced ratio of the CPD and then uses this ratio in a linear scoring function to obtain a final per\-image score\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/x4.png)\(a\)The distribution of scores for the state\-of\-the\-art membership inference on IARs\[[29](https://arxiv.org/html/2606.23872#bib.bib129)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x5.png)\(b\)The distribution of scores for a IAR\-generated image attribution\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x6.png)\(c\)The distribution of scores for the quantization and reconstruction error\.

Figure 3:Distributions of scores for memorized training samples vs re\-generated cases\.State\-of\-the\-art membership inference \(a\) and attribution \(b\) methods fail to distinguish memorized training samples \(Memorized Training\) from their verbatim generated counterparts \(Verbatim Re\-generated\), whereas our approach \(c\) clearly separates the two\. The evaluated model is RAR\-XXL\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.### 4\.1CPD\-based Methods Fall Short for MGI

Recent IGM\-specific MIAs exploit classifier\-free guidance\[[9](https://arxiv.org/html/2606.23872#bib.bib103)\], where the model is trained \(and evaluated\) both with conditioning \(e\.g\., class/prompt\) and without it, makingΔ\(⋅\)\\Delta\(\\cdot\)a natural statistic to measure if the modelremembersa specific prompt, image pair\. Yet, in the MGI setting, generated images are*also*optimized to score highly under the same latent generator that produced them\. Consequently,Δ\(⋅\)\\Delta\(\\cdot\)can be simultaneously large for both true members and the model’s own outputs, collapsing the separation that MIAs rely on\. In short, likelihood\-based or probability\-based MIAs are well\-suited to member vs held\-out*natural*data, but they are not designed to distinguish members from*model\-generated*non\-members, nor do they leverage potentially discriminative signals available in the autoencoder part of IGMs\.

### 4\.2Case Study: Memorized Training Samples

A special case of our MGI task arises when the model regenerates images largely resembling the training samples, which is known asmemorized training samples\. We adopt the methodology proposed by\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]to identify 169 memorized training samples for RAR\-XXL\[[30](https://arxiv.org/html/2606.23872#bib.bib69)\], where each memorized sample features an SSCD\[[13](https://arxiv.org/html/2606.23872#bib.bib107)\]similarity score higher than 0\.7\.

Table 1:Performance for different methods for identifying memorized samples\.The evaluated model is RAR\-XXL\.Concretely, we treat the 169 memorized training images as theoriginalsamples and their corresponding RAR\-XXL outputs as thegeneratedsamples\. The resulting score distributions for membership inference and image attribution are shown in[Figure3\(a\)](https://arxiv.org/html/2606.23872#S4.F3.sf1)and[Figure3\(b\)](https://arxiv.org/html/2606.23872#S4.F3.sf2), respectively\. We observe an even greater overlap between memorized training samples and their regenerated counterparts than in the standard MGI comparison\. Crucially, despite this highly challenging scenario, our approach leverages multiple different attribution signals and reliably separates memorized training images from their generated counterparts and substantially outperforms MIA\-based methods, as shown in[Figure3\(c\)](https://arxiv.org/html/2606.23872#S4.F3.sf3)\. Further results about the memorized training samples can be found in[AppendixE](https://arxiv.org/html/2606.23872#A5)\.

## 5Proposed Data Circuit Breaker

For our proposed solution to MGI, we combine signals from \(1\) the*autoencoder*that maps between pixels and latents and \(2\) the*latent generator*that models the latent distribution\. The key idea is that an image generated by a particular IGM tends to be*more self\-consistent*with that model’s autoencoder and latent generator than any natural image or an image generated by a different model, while membership\-specific effects \(train vs held\-out\) are more reliably detected after filtering likely\-generated samples\.

### 5\.1Autoencoder Self\-Consistency

Given the autoencoder𝒜\{\\mathcal\{A\}\}, where the encoderℰ\{\\mathcal\{E\}\}maps an image𝐱\{\\mathbf\{x\}\}from the pixel\-space to a latent representation and the decoder𝒟\{\\mathcal\{D\}\}reconstructs the image from the latent representation, we define the reconstruction error:

ℒRec\(𝐱\)=MSE\(𝐱,𝒜\(𝐱\)\)=MSE\(𝐱,𝒟∘ℰ\(𝐱\)\),\\mathcal\{L\}\_\{\\text\{Rec\}\}\(\{\\mathbf\{x\}\}\)=\\text\{MSE\}\\big\(\{\\mathbf\{x\}\},\{\\mathcal\{A\}\}\(\{\\mathbf\{x\}\}\)\\big\)=\\text\{MSE\}\\big\(\{\\mathbf\{x\}\},\{\\mathcal\{D\}\}\\circ\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\\big\),\(2\)whereMSE\(⋅,⋅\)\\text\{MSE\}\(\\cdot,\\cdot\)is the mean squared error\. Following AEDR\[[26](https://arxiv.org/html/2606.23872#bib.bib143)\], we use a*double reconstruction ratio*to normalize the loss:

ρRec\(𝐱\)=ℒRec\(𝐱\)MSE\(𝒜\(𝐱\),𝒜∘𝒜\(𝐱\)\)\.\\rho\_\{\\text\{Rec\}\}\(\{\\mathbf\{x\}\}\)=\\frac\{\\mathcal\{L\}\_\{\\text\{Rec\}\}\(\{\\mathbf\{x\}\}\)\}\{\\text\{MSE\}\\big\(\{\\mathcal\{A\}\}\(\{\\mathbf\{x\}\}\),\{\\mathcal\{A\}\}\\circ\{\\mathcal\{A\}\}\(\{\\mathbf\{x\}\}\)\\big\)\}\.\(3\)Intuitively, the denominator acts as a per\-image baseline: if an image is well\-aligned with the autoencoder manifold, the second reconstruction introduces hardly any loss, stabilizing the ratio across diverse content\.

#### VQ\-VAE Quantization Error\.

For IARs, the autoencoder is typically a VQ\-VAE, which introduces an additional, highly informative signal, namely*quantization error*\. The reconstruction procedure of the VQ\-VAE introduces a quantization step, where𝒬\{\\mathcal\{Q\}\}maps a continuous latent representation of an image𝐱\{\\mathbf\{x\}\}to entries of a codebook\. The inverse𝒬−1\{\\mathcal\{Q\}^\{\-1\}\}, reverts this process and maps codebook indices to the latent representation\. We define the quantization errorℒQ\(𝐱\)\\mathcal\{L\}\_\{\\text\{Q\}\}\(\{\\mathbf\{x\}\}\)as:

ℒQ\(𝐱\)=MSE\(ℰ\(𝐱\),𝒬−1∘𝒬∘ℰ\(𝐱\)\)\.\\mathcal\{L\}\_\{\\text\{Q\}\}\(\{\\mathbf\{x\}\}\)=\\text\{MSE\}\\big\(\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\),\{\\mathcal\{Q\}^\{\-1\}\}\\circ\{\\mathcal\{Q\}\}\\circ\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\\big\)\.\(4\)
Images synthesized by a given IAR tend to incur smallerℒQ\\mathcal\{L\}\_\{\\text\{Q\}\}under that model’s VQ\-VAE compared to natural images or images generated by other IGMs, as only they are produced*through*the same discrete codebook\. We therefore use the combined autoencoder attribution score

ℒ𝒜\(𝐱\)=\{ρRec\(𝐱\)⋅ℒQ\(𝐱\),\(IAR / VQ\-VAE\)ρRec\(𝐱\),\(DM / VAE\)\.\\mathcal\{L\}\_\{\\mathcal\{A\}\}\(\{\\mathbf\{x\}\}\)=\\begin\{cases\}\\rho\_\{\\text\{Rec\}\}\(\{\\mathbf\{x\}\}\)\\cdot\\mathcal\{L\}\_\{\\text\{Q\}\}\(\{\\mathbf\{x\}\}\),&\\text\{\(IAR / VQ\-VAE\)\}\\\\ \\rho\_\{\\text\{Rec\}\}\(\{\\mathbf\{x\}\}\),&\\text\{\(DM / VAE\)\}\.\\end\{cases\}\(5\)

#### Optional Encoder Refinement for IARs\.

The limited alignment between encoder and decoder in IARs introduces additional losses and attribution can degrade\. Following\[[35](https://arxiv.org/html/2606.23872#bib.bib153)\], we therefore optionally refine the encoder post hoc by fine\-tuningℰ^\\hat\{\{\\mathcal\{E\}\}\}to better invert the decoder𝒟\{\\mathcal\{D\}\}\. Fine\-tuning is performed on a*disjoint*set of latent feature maps𝐳\{\\mathbf\{z\}\}that were generated by the IAR with the inversion loss:

ℒInv=MSE\(ℰ^∘𝒟\(𝐳\),𝐳\),\\mathcal\{L\}\_\{\\text\{Inv\}\}=\\text\{MSE\}\\big\(\\hat\{\{\\mathcal\{E\}\}\}\\circ\{\\mathcal\{D\}\}\(\{\\mathbf\{z\}\}\),\{\\mathbf\{z\}\}\\big\),\(6\)which improves the stability ofℒ𝒜\\mathcal\{L\}\_\{\\mathcal\{A\}\}while preserving the post\-hoc setting, as no changes are introduced to the latent generator\.

### 5\.2Cross\-Generator Consistency

While the autoencoder attribution score is able to identify images that were likely generated by a given model family, they are insufficient to attribute the*exact latent generator*\. Especially in the model derivative setting, all generated images are decoded by the same decoder, and the autoencoder attribution score remains identical for all of them\. Therefore we introduce an additional generator\-based features derived from conditional probability discrepancy\. Given two candidate modelsℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}, we form a two\-dimensional feature vector based on the conditional probability of the two models\.

ϕ\(𝐱,𝐜\)=\(log⁡P𝒢1\(ℰ\(𝐱\)∣𝐜\),log⁡P𝒢2\(ℰ\(𝐱\)∣𝐜\)\)\.\{\\mathbf\{\\phi\}\}\(\{\\mathbf\{x\}\},\{\\mathbf\{c\}\}\)=\\big\(\\log P\_\{\{\\mathcal\{G\}\}\_\{1\}\}\(\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\\mid\{\\mathbf\{c\}\}\),\\log P\_\{\{\\mathcal\{G\}\}\_\{2\}\}\(\{\\mathcal\{E\}\}\(\{\\mathbf\{x\}\}\)\\mid\{\\mathbf\{c\}\}\)\\big\)\.\(7\)Intuitively this vector encodes the information about the membership of𝐱\{\\mathbf\{x\}\}to both the latent generator𝒢1\{\\mathcal\{G\}\}\_\{1\}and𝒢2\{\\mathcal\{G\}\}\_\{2\}\.

As we have access to the image generative modelsℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}, for which we want to infer the membership of given samples, we start by estimating the class\-conditional densities overϕ\{\\mathbf\{\\phi\}\}by generating new data withℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}\.

We then estimate class\-conditional densities overϕ\{\\mathbf\{\\phi\}\}using reference samples drawn from each model \(e\.g\., freshly generated sets with independent random seeds\) and perform attribution via likelihood comparison \(KDE in our implementation\)\. This cross\-model view provides separation even when absolute discrepancy values overlap, because images tend to be relatively moreconsistentwith the generator that produced them\. We construct reference sets from each source: a reference setℛG\\mathcal\{R\}\_\{G\}drawn fromGG\(representingℳ1\{\\mathcal\{M\}\}\_\{1\}\-generated images\) and a reference setℛG′\\mathcal\{R\}\_\{G^\{\\prime\}\}drawn fromG′G^\{\\prime\}\(representingℳ2\{\\mathcal\{M\}\}\_\{2\}\-generated images\)\. We then fit class\-conditional KDE densities over the feature vectors of each reference set:

p^G\(ϕ\)=1\|ℛG\|∑i=1\|ℛG\|Kh\(ϕ−ϕiG\),\\displaystyle\\hat\{p\}\_\{G\}\(\{\\mathbf\{\\phi\}\}\)=\\frac\{1\}\{\|\\mathcal\{R\}\_\{G\}\|\}\\sum\_\{i=1\}^\{\|\\mathcal\{R\}\_\{G\}\|\}K\_\{h\}\\\!\\big\(\{\\mathbf\{\\phi\}\}\-\{\\mathbf\{\\phi\}\}\_\{i\}^\{G\}\\big\),\(8\)p^G′\(ϕ\)=1\|ℛG′\|∑j=1\|ℛG′\|Kh\(ϕ−ϕjG′\)\.\\displaystyle\\hat\{p\}\_\{G^\{\\prime\}\}\(\{\\mathbf\{\\phi\}\}\)=\\frac\{1\}\{\|\\mathcal\{R\}\_\{G^\{\\prime\}\}\|\}\\sum\_\{j=1\}^\{\|\\mathcal\{R\}\_\{G^\{\\prime\}\}\|\}K\_\{h\}\\\!\\big\(\{\\mathbf\{\\phi\}\}\-\{\\mathbf\{\\phi\}\}\_\{j\}^\{G^\{\\prime\}\}\\big\)\.\(9\)

### 5\.3Attribution Protocol

We combine the above signals in a cascade designed for the MGI setting\.

#### Stage 1: Autoencoder\-based Filtering \(Generated vs\. Non\-Generated\)\.

We first apply the autoencoder scoreℒ𝒜\(𝐱\)\\mathcal\{L\}\_\{\\mathcal\{A\}\}\(\{\\mathbf\{x\}\}\)to identify a high\-confidence subset of*IGM\-generated*samples and separate them from samples that are unlikely to be produced by the target pipeline\.

#### Stage 2: Membership Inference on the Remaining Samples \(Member vs\. Non\-Member\)\.

On images detected non\-generated by Stage 1, we apply standard MIA\-style scoring based on the latent generator \(e\.g\.,Δ\(ℳ,𝐱,𝐜\)\\Delta\(\{\\mathcal\{M\}\},\{\\mathbf\{x\}\},\{\\mathbf\{c\}\}\)or ICAS\) to distinguish training members from non\-members\. Restricting MIAs to this subset restores their core assumption \(members vs non\-members\), and substantially reduces false positives caused by model\-generated images\. With stage 1 and 2, we address the MGI problem for the direct training setting ofℳ1\\mathcal\{M\}\_\{1\}\. Specifically, we instantiate our second stage with ICAS, which is the best\-performing MIA for most models\. We note that, although PRADA outperforms ICAS on RAR, it requires an extra calibration set\. This is an extra advantage beyond our main setting, and restricts the applicability of PRADA\. Therefore, we do not choose ICAS instead of PRADA to instantiate our stage 2 for any models\.

#### Stage 3: Source Attribution among Generators \(Data Circuits\)\.

In theM2M\_\{2\}setting of[Figure1](https://arxiv.org/html/2606.23872#S1.F1), where training data may itself be generated, we additionally apply cross\-model generator attribution usingϕ\(𝐱,𝐜\)\{\\mathbf\{\\phi\}\}\(\{\\mathbf\{x\}\},\{\\mathbf\{c\}\}\)and reference sets fromℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}to separateℳ1\\mathcal\{M\}\_\{1\}\-generated samples used for trainingℳ2\\mathcal\{M\}\_\{2\},ℳ1\\mathcal\{M\}\_\{1\}\-generated samples not used for trainingℳ2\\mathcal\{M\}\_\{2\}, andℳ2\\mathcal\{M\}\_\{2\}\-generated samples\. Combined with Stage 1 and Stage 2, this yields a practical decomposition into \(1\) natural members, \(2\) natural non\-members, \(3\) generated samples attributed to a specific model, and \(4\) generated samples used for training downstream models, thus fully addressing MGI in the presence of data circuits\.

## 6Empirical Evaluation

### 6\.1Experimental Setup

Models\.We evaluate SOTA IARs and DMs, following the previous work on MIAs for IGMs\[[11](https://arxiv.org/html/2606.23872#bib.bib128),[6](https://arxiv.org/html/2606.23872#bib.bib66),[34](https://arxiv.org/html/2606.23872#bib.bib110),[29](https://arxiv.org/html/2606.23872#bib.bib129)\]\. Our selection of models requires access to their training sets for our analysis to verify the outcome of MIAs and MGIs\. We useVAR\-d30\(d= model depth\)\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\],RAR\-XXL\[[30](https://arxiv.org/html/2606.23872#bib.bib69)\], andLlamaGen\-XXL\[[22](https://arxiv.org/html/2606.23872#bib.bib135)\], trained for class\-conditioned generation\. We download the pre\-trained weights from the corresponding repositories and for generation we follow the settings recommended in the original works\. For the DMs we focus on the UNet\[[16](https://arxiv.org/html/2606.23872#bib.bib100)\]based architecturesStable Diffusion 1\.4and2\.1\[[15](https://arxiv.org/html/2606.23872#bib.bib74)\]\.

Datasets\.As the above IARs were trained on ImageNet\-1k\[[5](https://arxiv.org/html/2606.23872#bib.bib91)\]dataset, we use it to perform our MGI and MIA tasks\. We sample 1000 samples from the training set as members and similarly 1000 samples from the validation set as non\-members\. For Stable Diffusion we follow CLiD\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\]and first fine\-tune the model on a set of 2500 MS\-COCO images for 50k steps to obtain a set of natural member samples and non\-member samples\. Then we use 1000 samples from training as members and 1000 samples from validation as non\-members\.

Fine\-tuning\.We fine\-tune the second modelℳ2\\mathcal\{M\}\_\{2\}on 5000 images generated by the first modelℳ1\\mathcal\{M\}\_\{1\}\. We denote the images generated byℳ1\\mathcal\{M\}\_\{1\}asGMG\_\{M\}\. We also keep another 1000 images generated byℳ1\\mathcal\{M\}\_\{1\}as a held\-out set \(denoted asGNG\_\{N\}, which aregenerateddata points that act asnon\-members\)\. For IARs we fine\-tuneℳ2\\mathcal\{M\}\_\{2\}for 5 epochs, while for DMs we use 20\. The learning rate is1×10−51\\times 10^\{\-5\}for all models\. We provide the hyperparameter details in[AppendixA](https://arxiv.org/html/2606.23872#A1)\.

Baselines\.MGI is a*newly defined task*without an existing solution\. We follow standard practice for newly defined tasks by adapting SoTA methods from the closest domains MIA and image attribution\.Regarding theMIA baselines, we choose SoTA MIA methods for IARs and DMs, respectively\.For IARs, we use\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]and refer to the method as PIAR, and ICAS\[[29](https://arxiv.org/html/2606.23872#bib.bib129)\]\. For the DMs, we use the SOTA MIA CLiD and extend the IAR\-based ICAS to DMs based on the CLiD scores\. In[Section6\.4](https://arxiv.org/html/2606.23872#S6.SS4), We further test strong MIAs, LiRA/RMIA, which we give an*advantage*by training shadow models\. For the direct training setting, MIAs make the assumption that members have a higher score than all non\-member samples and we extend this assumption to MGI\.Regarding theimage attribution baseline,we consider PRADA\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\], which is originally proposed for IARs but extended to DMs by us\. Under the direct training setting, the image attribution methods make the assumption that generated samples will have the highest score, followed by members and non\-members\. The same intuition extends to the derivative setting\.

Metrics\.In the following we focus on the, especially for inference tasks, relevant metric of TPR@1%FPR and additionally provide the AUC in[AppendixC](https://arxiv.org/html/2606.23872#A3)\.

### 6\.2Evaluation on the Direct Training Setting

Table 2:TPR@1%FPR for IARs in the direct training setting\.Only DCB achieves consistent performance across all comparisons and models\.Table 3:TPR@1%FPR for DMs in the direct training setting\.Only DCB achieves consistent performance across all comparisons and models\.First we focus on the direct training setting, known from the MIA task, where the modelℳ1\\mathcal\{M\}\_\{1\}was trained on natural images resulting in the natural membersNMN\_\{M\}and natural non\-membersNNN\_\{N\}\. However, our MGI introduces the models generated samplesGGas a new an important part of this task\.

Under this new setting, we analyze the performance of the original MIAs and image attribution methods for IARs in[Table2](https://arxiv.org/html/2606.23872#S6.T2)and report the TPR@1%FPR\. We find that while existing MIAs are able to separate the natural members from the natural non\-members, they break when the generated data is introduced\. Our DCB however achieves near 100% TPR@1%FPR under the generated vs natural setting and improves the average performance by over 36% \(LlamaGen\)\. Under the MGI task DCB benefits from combining multiple signals leading to a consistent performance across detections\.

We additionally analyze the MGI task on DMs in[Table3](https://arxiv.org/html/2606.23872#S6.T3), following CLiD\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\]and fine\-tuning the models on natural MS\-COCO data to obtain natural members and non\-members\. Our results show that, similar as for the IARs, while the baseline methods are able to distinguishNNN\_\{N\}andNMN\_\{M\}, they fail when the generated data is introduced\. This difficulty for MIAs is additionally visualized in[AppendixB](https://arxiv.org/html/2606.23872#A2), which plots the score distributions for the different datasets\. As the generated data was produced byℳ1\\mathcal\{M\}\_\{1\}, the MIA obtains a high score, as the modelremembersthe sample\. This occurs because likelihood\-based scores are similarly elevated for both member samples and the model’s own outputs, collapsing the separation that MIAs rely upon\. Only DCB, which takes the full generative pipeline into account, can distinguish the generated data from the natural data\. This effect is especially pronounced in theℳ2\\mathcal\{M\}\_\{2\}setting, where DCB beats the baselines by more than 39% \(SD2\.1\)\.

### 6\.3Evaluation on the Model Derivative Setting

As images generated by IGMs experience a widespread reuse, we shift the focus to the derivative setting, where a modelℳ2\\mathcal\{M\}\_\{2\}was fine\-tuned on generated imagesGMG\_\{M\}, which are now the member samples ofℳ2\\mathcal\{M\}\_\{2\}\. The new model continuously generates new imagesG′G^\{\\prime\}, which, with the generated non\-membersGNG\_\{N\}introduces three datasets to the MGI task\. In[Table4](https://arxiv.org/html/2606.23872#S6.T4)we report the TPR@1%FPR for distinguishing the different datasets and find that the existing methods struggle to differentiate the distributions for both IARs and DMs\. Our DCB, on the other hand, is able to clearly distinguish the two sets\.

The difficulty of this new MGI setting is reflected in the comparison of[Table4](https://arxiv.org/html/2606.23872#S6.T4)\. While the baselines achieve reasonable performance for mostNatural vs Generatedcases, they struggle when differentiating within the generated samples\. Particularly for the DMs, the detection performance collapses\. The score distributions in[FigureA4](https://arxiv.org/html/2606.23872#A2.F4)and[AppendixB](https://arxiv.org/html/2606.23872#A2)provide additional insights as to why the MGI setting is fundamentally more difficult\. Contrary to MIA assumption, the score of the generated samples is larger than the score of non\-members and overlaps or exceeds the score of the members\. This property of the generated samples is why standard MIA fail\.

Table 4:TPR@1%FPR for the model derivative setting\.Most existing methods fail to attribute generated samples correctly\.
### 6\.4Analysis for Strong MIA

Table 5:TPR@1%FPR for the strong MIA\.We consider both LiRA and RMIA and train 5 shadow models\.We expand on the explored MIAs, by analyzing stronger methods and employ both LiRA\[[2](https://arxiv.org/html/2606.23872#bib.bib12)\]and RMIA\[[33](https://arxiv.org/html/2606.23872#bib.bib115)\]on the model derivative setting and show that even under access to trained shadow models, the MIA fail in the MGI setting\. Both LiRA and RMIA require a scaler score to compute a one\-dimensional probability distribution\. We use the strongest probability\-based MIA method ICAS\[[29](https://arxiv.org/html/2606.23872#bib.bib129)\]to convert the token\-wise probabilities predicted by IARs into a score scalar for each sample\.

Concretely, we obtain 5 shadow models, by fine\-tuningℳ1\\mathcal\{M\}\_\{1\}for 5 epochs, with the same hyperparameters used forℳ2\\mathcal\{M\}\_\{2\}, on datasets of 2500 samples randomly drawn from a 5000\-sample shadow dataset\. For RMIA, we utilize an additional population dataset of 1000 samples generated byℳ1\\mathcal\{M\}\_\{1\}and set its core hyperparameters toα=0\.3\\alpha=0\.3\. This setup enables the methods to estimate the distribution of generated members and non\-member samples, giving these methods a strict advantage for theGNG\_\{N\}vsGMG\_\{M\}case compared to DCB\. We report the TPR@1%FPR in[Table5](https://arxiv.org/html/2606.23872#S6.T5), for VAR and RAR\. The results highlight that even under significant advantages, strong MIAs are not sufficient to solve the MGI task\. Specifically for theNatural vs\. Generatedidentification the strong MIA perform similar to the MIAs without shadow models\. Notably, our DCB consistently outperforms both LiRA and RMIA across all comparisons, highlighting that utilizing the full model pipeline provides stronger signals\.

## 7Conclusions

We introduced Member vs Generated Inference \(MGI\), a new and strictly harder inference task than standard membership inference\. MGI requires separating a generative model’s training members from its own generated outputs, including in data\-circuit settings where subsequent models are trained on generated data\. We showed that existing membership inference and attribution methods are inadequate for MGI because modern generative models produce non\-member samples that are closely tied to the training distribution, leading to MIAs systematically misclassifying generated samples as members, while attribution methods mislabel true training members as generated\. To address this, we proposed DCB, a multi\-stage pipeline that covers the full generation process by leveraging complementary signals from the autoencoder and latent generator\. By first identifying synthesized content and then distinguishing remaining training members from non\-members, DCB is able to consistently outperform previous methods\. We demonstrated that DCB remains effective even on memorization, separating original training samples from their regurgitated counterparts and enabling practical mitigation of harmful data circuits\. Finally, we showed that DCB achieves better detection rate than strong membership inference attacks such as LiRA and RMIA, highlighting that a holistic procedure of the full generative pipeline is essential to solve MGI\.

## Acknowledgments

This research was funded by the Deutsche Forschungsgemeinschaft \(DFG, German Research Foundation\), Project number 550224287\. Franziska Boenisch received funding from the European Research Council \(ERC\) under the European Union’s Horizon Europe research and innovation programme \(grant agreement No 101220235\)\. We would like to acknowledge our sponsors, who support our research with financial and in\-kind contributions: OpenAI and G\-Research\. We also thank members of the SprintML group for their feedback\. Responsibility for the content of this publication lies with the authors\.

## References

- \[1\]S\. Alemohammad, J\. Casco\-Rodriguez, L\. Luzi, A\. I\. Humayun, H\. Babaei, D\. LeJeune, A\. Siahkoohi, and R\. Baraniuk\(2024\)Self\-consuming generative models go MAD\.InThe Twelfth International Conference on Learning Representations,Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[§3](https://arxiv.org/html/2606.23872#S3.p4.14)\.
- \[2\]\(2022\)Membership inference attacks from first principles\.In2022 IEEE Symposium on Security and Privacy \(SP\),pp\. 1897–1914\.External Links:[Document](https://dx.doi.org/10.1109/SP46214.2022.9833649)Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1),[§6\.4](https://arxiv.org/html/2606.23872#S6.SS4.p1.1)\.
- \[3\]N\. Carlini, J\. Hayes, M\. Nasr, M\. Jagielski, V\. Sehwag, F\. Tramer, B\. Balle, D\. Ippolito, and E\. Wallace\(2023\)Extracting training data from diffusion models\.In32nd USENIX Security Symposium \(USENIX Security 23\),pp\. 5253–5270\.Cited by:[§1](https://arxiv.org/html/2606.23872#S1.p1.6),[§2](https://arxiv.org/html/2606.23872#S2.p6.1)\.
- \[4\]S\. Damm, J\. Ricker, H\. Petzka, and A\. Fischer\(2025\)PRADA: probability\-ratio\-based attribution and detection of autoregressive\-generated images\.arXiv preprint arXiv:2511\.20068\.Cited by:[1\(b\)](https://arxiv.org/html/2606.23872#A2.F1.sf2),[1\(b\)](https://arxiv.org/html/2606.23872#A2.F1.sf2.3.2),[2\(b\)](https://arxiv.org/html/2606.23872#A2.F2.sf2),[2\(b\)](https://arxiv.org/html/2606.23872#A2.F2.sf2.3.2),[§B\.1](https://arxiv.org/html/2606.23872#A2.SS1.p1.1),[§1](https://arxiv.org/html/2606.23872#S1.p2.1),[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[2\(b\)](https://arxiv.org/html/2606.23872#S4.F2.sf2),[2\(b\)](https://arxiv.org/html/2606.23872#S4.F2.sf2.3.2),[3\(b\)](https://arxiv.org/html/2606.23872#S4.F3.sf2),[3\(b\)](https://arxiv.org/html/2606.23872#S4.F3.sf2.3.2),[§4](https://arxiv.org/html/2606.23872#S4.p5.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p4.1)\.
- \[5\]J\. Deng, W\. Dong, R\. Socher, L\. Li, K\. Li, and L\. Fei\-Fei\(2009\)Imagenet: a large\-scale hierarchical image database\.In2009 IEEE conference on computer vision and pattern recognition,pp\. 248–255\.Cited by:[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p2.1)\.
- \[6\]J\. Dubiński, A\. Kowalczuk, F\. Boenisch, and A\. Dziedzic\(2025\)CDI: Copyrighted Data Identification in Diffusion Models\.InThe IEEE CVF Computer Vision and Pattern Recognition Conference \(CVPR\),Cited by:[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[7\]V\. Feldman\(2020\)Does learning require memorization? a short tale about a long tail\.InProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing,pp\. 954–959\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p6.1)\.
- \[8\]J\. Ho, A\. Jain, and P\. Abbeel\(2020\)Denoising Diffusion Probabilistic Models\.InConference on Neural Information Processing Systems \(NeurIPS\),pp\. 6840–6851\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p2.5)\.
- \[9\]J\. Ho and T\. Salimans\(2022\)Classifier\-free diffusion guidance\.arXiv preprint arXiv:2207\.12598\.Cited by:[§4\.1](https://arxiv.org/html/2606.23872#S4.SS1.p1.2)\.
- \[10\]N\. Jovanović, I\. Labiad, T\. Souček, M\. Vechev, and P\. Fernandez\(2026\)Watermarking autoregressive image generation\.Advances in Neural Information Processing Systems38,pp\. 71801–71848\.Cited by:[Appendix F](https://arxiv.org/html/2606.23872#A6.p1.3)\.
- \[11\]A\. Kowalczuk, J\. Dubiński, F\. Boenisch, and A\. Dziedzic\(2025\)Privacy attacks on image autoregressive models\.InForty\-Second International Conference on Machine Learning \(ICML\),Cited by:[1\(a\)](https://arxiv.org/html/2606.23872#A2.F1.sf1),[1\(a\)](https://arxiv.org/html/2606.23872#A2.F1.sf1.3.2),[2\(a\)](https://arxiv.org/html/2606.23872#A2.F2.sf1),[2\(a\)](https://arxiv.org/html/2606.23872#A2.F2.sf1.3.2),[§B\.1](https://arxiv.org/html/2606.23872#A2.SS1.p1.1),[§1](https://arxiv.org/html/2606.23872#S1.p1.6),[§1](https://arxiv.org/html/2606.23872#S1.p2.1),[§2](https://arxiv.org/html/2606.23872#S2.p4.1),[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[§2](https://arxiv.org/html/2606.23872#S2.p6.1),[2\(a\)](https://arxiv.org/html/2606.23872#S4.F2.sf1),[2\(a\)](https://arxiv.org/html/2606.23872#S4.F2.sf1.3.2),[§4\.2](https://arxiv.org/html/2606.23872#S4.SS2.p1.1),[§4](https://arxiv.org/html/2606.23872#S4.p1.2),[§4](https://arxiv.org/html/2606.23872#S4.p5.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p4.1)\.
- \[12\]B\. Marek, L\. Rossi, V\. Hanke, X\. Wang, M\. Backes, F\. Boenisch, and A\. Dziedzic\(2026\)Benchmarking empirical privacy protection for adaptations of large language models\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=jY7fAo9rfK)Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1)\.
- \[13\]E\. Pizzi, S\. D\. Roy, S\. N\. Ravindra, P\. Goyal, and M\. Douze\(2022\)A self\-supervised descriptor for image copy detection\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,pp\. 14532–14542\.Cited by:[§4\.2](https://arxiv.org/html/2606.23872#S4.SS2.p1.1)\.
- \[14\]A\. Radford, J\. W\. Kim, C\. Hallacy, A\. Ramesh, G\. Goh, S\. Agarwal, G\. Sastry, A\. Askell, P\. Mishkin, J\. Clark, G\. Krueger, and I\. Sutskever\(2021\)Learning transferable visual models from natural language supervision\.InInternational Conference on Machine Learning \(ICML\),pp\. 8748–8763\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p2.5)\.
- \[15\]R\. Rombach, A\. Blattmann, D\. Lorenz, P\. Esser, and B\. Ommer\(2022\)High\-resolution image synthesis with latent diffusion models\.InIEEE/CVF Conference on Computer Vision and Pattern Recognition,Cited by:[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[16\]O\. Ronneberger, P\. Fischer, and T\. Brox\(2015\)U\-net: convolutional networks for biomedical image segmentation\.InMedical Image Computing and Computer\-Assisted Intervention – MICCAI 2015,N\. Navab, J\. Hornegger, W\. M\. Wells, and A\. F\. Frangi \(Eds\.\),Cham,pp\. 234–241\.Cited by:[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[17\]L\. Rossi, B\. Marek, F\. Boenisch, and A\. Dziedzic\(2026\)Natural identifiers for privacy and data audits in large language models\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=doaAUf9Pi7)Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1)\.
- \[18\]A\. Salem, Y\. Zhang, M\. Humbert, P\. Berrang, M\. Fritz, and M\. Backes\(2019\)Ml\-leaks: model and data independent membership inference attacks and defenses on machine learning models\.InNetwork and Distributed System Security Symposium \(NDSS\),Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1)\.
- \[19\]R\. Shokri, M\. Stronati, C\. Song, and V\. Shmatikov\(2017\)Membership inference attacks against machine learning models\.In2017 IEEE symposium on security and privacy \(SP\),pp\. 3–18\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1)\.
- \[20\]I\. Shumailov, Z\. Shumaylov, Y\. Zhao, N\. Papernot, R\. Anderson, and Y\. Gal\(2024\)AI models collapse when trained on recursively generated data\.Nature631\(8022\),pp\. 755–759\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[§3](https://arxiv.org/html/2606.23872#S3.p4.14)\.
- \[21\]Y\. Song and S\. Ermon\(2020\)Improved Techniques for Training Score\-Based Generative Models\.InConference on Neural Information Processing Systems \(NeurIPS\),pp\. 12438–12448\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p2.5)\.
- \[22\]P\. Sun, Y\. Jiang, S\. Chen, S\. Zhang, B\. Peng, P\. Luo, and Z\. Yuan\(2024\)Autoregressive model beats diffusion: llama for scalable image generation\.arXiv preprint arXiv:2406\.06525\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p3.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[23\]K\. Tian, Y\. Jiang, Z\. Yuan, B\. Peng, and L\. Wang\(2024\)Visual autoregressive modeling: scalable image generation via next\-scale prediction\.Advances in neural information processing systems37,pp\. 84839–84865\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p3.1)\.
- \[24\]K\. Tian, Y\. Jiang, Z\. Yuan, B\. Peng, and L\. Wang\(2024\)Visual autoregressive modeling: scalable image generation via next\-scale prediction\.External Links:2404\.02905,[Link](https://arxiv.org/abs/2404.02905)Cited by:[Figure A2](https://arxiv.org/html/2606.23872#A2.F2.2.1),[Figure A2](https://arxiv.org/html/2606.23872#A2.F2.4.2),[4\(a\)](https://arxiv.org/html/2606.23872#A2.F4.sf1),[4\(a\)](https://arxiv.org/html/2606.23872#A2.F4.sf1.3.2),[5\(a\)](https://arxiv.org/html/2606.23872#A2.F5.sf1),[5\(a\)](https://arxiv.org/html/2606.23872#A2.F5.sf1.3.2),[6\(a\)](https://arxiv.org/html/2606.23872#A2.F6.sf1),[6\(a\)](https://arxiv.org/html/2606.23872#A2.F6.sf1.3.2),[Figure 2](https://arxiv.org/html/2606.23872#S4.F2),[Figure 2](https://arxiv.org/html/2606.23872#S4.F2.4.2.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[25\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, L\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.InConference on Neural Information Processing Systems \(NeurIPS\),pp\. 5998–6008\.Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p2.5)\.
- \[26\]C\. Wang, K\. Chen, Z\. Yang, Y\. Wang, and W\. Zhang\(2025\)AEDR: training\-free ai\-generated image attribution via autoencoder double\-reconstruction\.arXiv preprint arXiv:2507\.18988\.Cited by:[§5\.1](https://arxiv.org/html/2606.23872#S5.SS1.p1.5)\.
- \[27\]W\. Wang, M\. A\. Kaleem, A\. Dziedzic, M\. Backes, N\. Papernot, and F\. Boenisch\(2024\)Memorization in self\-supervised learning improves downstream generalization\.InThe Twelfth International Conference on Learning Representations \(ICLR\),Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p6.1)\.
- \[28\]J\. Yao, B\. Yang, and X\. Wang\(2025\)Reconstruction vs\. generation: taming optimization dilemma in latent diffusion models\.InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,Cited by:[Appendix D](https://arxiv.org/html/2606.23872#A4.p1.1)\.
- \[29\]H\. Yu, Y\. Qiu, Y\. Yang, H\. Fang, T\. Zhuang, J\. Hong, B\. Chen, H\. Wu, and S\. Xia\(2025\)ICAS: detecting training data from autoregressive image generative models\.InProceedings of the 33rd ACM International Conference on Multimedia,pp\. 11209–11217\.Cited by:[Figure A4](https://arxiv.org/html/2606.23872#A2.F4.1.1),[Figure A4](https://arxiv.org/html/2606.23872#A2.F4.14.7),[§1](https://arxiv.org/html/2606.23872#S1.p2.1),[§2](https://arxiv.org/html/2606.23872#S2.p4.1),[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[3\(a\)](https://arxiv.org/html/2606.23872#S4.F3.sf1),[3\(a\)](https://arxiv.org/html/2606.23872#S4.F3.sf1.3.2),[§4](https://arxiv.org/html/2606.23872#S4.p1.2),[§4](https://arxiv.org/html/2606.23872#S4.p5.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p4.1),[§6\.4](https://arxiv.org/html/2606.23872#S6.SS4.p1.1)\.
- \[30\]Q\. Yu, J\. He, X\. Deng, X\. Shen, and L\. Chen\(2024\)Randomized autoregressive visual generation\.External Links:2411\.00776,[Link](https://arxiv.org/abs/2411.00776)Cited by:[§4\.2](https://arxiv.org/html/2606.23872#S4.SS2.p1.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1)\.
- \[31\]Q\. Yu, J\. He, X\. Deng, X\. Shen, and L\. Chen\(2025\)Randomized autoregressive visual generation\.InProceedings of the IEEE/CVF International Conference on Computer Vision,pp\. 18431–18441\.Cited by:[Figure A1](https://arxiv.org/html/2606.23872#A2.F1.2.1),[Figure A1](https://arxiv.org/html/2606.23872#A2.F1.4.2),[4\(b\)](https://arxiv.org/html/2606.23872#A2.F4.sf2),[4\(b\)](https://arxiv.org/html/2606.23872#A2.F4.sf2.3.2),[5\(b\)](https://arxiv.org/html/2606.23872#A2.F5.sf2),[5\(b\)](https://arxiv.org/html/2606.23872#A2.F5.sf2.3.2),[6\(b\)](https://arxiv.org/html/2606.23872#A2.F6.sf2),[6\(b\)](https://arxiv.org/html/2606.23872#A2.F6.sf2.3.2),[§2](https://arxiv.org/html/2606.23872#S2.p3.1),[Figure 3](https://arxiv.org/html/2606.23872#S4.F3),[Figure 3](https://arxiv.org/html/2606.23872#S4.F3.6.2.1)\.
- \[32\]S\. Yu, S\. Kwak, H\. Jang, J\. Jeong, J\. Huang, J\. Shin, and S\. Xie\(2025\)Representation alignment for generation: training diffusion transformers is easier than you think\.InInternational Conference on Learning Representations,Cited by:[Appendix D](https://arxiv.org/html/2606.23872#A4.p1.1)\.
- \[33\]S\. Zarifzadeh, P\. Liu, and R\. Shokri\(2024\)Low\-cost high\-power membership inference attacks\.InForty\-first International Conference on Machine Learning,Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p4.1),[§6\.4](https://arxiv.org/html/2606.23872#S6.SS4.p1.1)\.
- \[34\]S\. Zhai, H\. Chen, Y\. Dong, J\. Li, Q\. Shen, Y\. Gao, H\. Su, and Y\. Liu\(2024\)Membership inference on text\-to\-image diffusion models via conditional likelihood discrepancy\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=DztaBt4wP5)Cited by:[3\(a\)](https://arxiv.org/html/2606.23872#A2.F3.sf1),[3\(a\)](https://arxiv.org/html/2606.23872#A2.F3.sf1.2.1),[3\(b\)](https://arxiv.org/html/2606.23872#A2.F3.sf2),[3\(b\)](https://arxiv.org/html/2606.23872#A2.F3.sf2.2.1),[§B\.1](https://arxiv.org/html/2606.23872#A2.SS1.p3.4),[§1](https://arxiv.org/html/2606.23872#S1.p2.1),[§2](https://arxiv.org/html/2606.23872#S2.p4.1),[§4](https://arxiv.org/html/2606.23872#S4.p1.2),[§4](https://arxiv.org/html/2606.23872#S4.p5.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p1.1),[§6\.1](https://arxiv.org/html/2606.23872#S6.SS1.p2.1),[§6\.2](https://arxiv.org/html/2606.23872#S6.SS2.p3.4)\.
- \[35\]B\. Zhao, L\. Kerner, M\. Meintz, T\. Bakr, F\. Boenisch, and A\. Dziedzic\(2026\)Data provenance for image auto\-regressive generation\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=qYu4wj7O3z)Cited by:[§2](https://arxiv.org/html/2606.23872#S2.p5.1),[§5\.1](https://arxiv.org/html/2606.23872#S5.SS1.SSS0.Px2.p1.3)\.
- \[36\]B\. Zhao, L\. Kerner, M\. Meintz, T\. Bakr, F\. Boenisch, and A\. Dziedzic\(2026\)Data provenance for image auto\-regressive generation\.InThe Fourteenth International Conference on Learning Representations \(ICLR\),Cited by:[Appendix F](https://arxiv.org/html/2606.23872#A6.p1.3)\.

## Appendix AFurther Implementation Details

### A\.1Data Pre\-processing

We follow the augmentations in the original training recipes of each model to ensure that our evaluation faithfully reflects the conditions under which membership and generation signals arise\. For VAR and LlamaGen, when computing the MIA scores onℳ1\\mathcal\{M\}\_\{1\}, we apply the same data augmentations used during training: each image is first resized by a factor of1\.1251\.125for VAR and1\.11\.1for LlamaGen, followed by a center crop to the model’s native resolution \(256×256256\\times 256for VAR and384×384384\\times 384for LlamaGen\)\.

### A\.2Fine\-tuning

We provide the hyperparameters for fine\-tuningℳ2\\mathcal\{M\}\_\{2\}in the model derivative setting in[TableA1](https://arxiv.org/html/2606.23872#A1.T1)\. For all models, we fine\-tune exclusively the*latent generator*while keeping the autoencoder weights frozen\. The latent generator corresponds to the transformer in IARs and the UNet in the diffusion models\. This design choice mirrors the common practice in which downstream practitioners adapt only the generative backbone to new data\. The fine\-tuning data consists of5,0005\{,\}000images generated byℳ1\\mathcal\{M\}\_\{1\}\. For the IARs \(VAR and RAR\),55epochs suffice to adapt the latent generator to the generated distribution, whereas for the diffusion models \(SD 1\.4 and SD 2\.1\) we train for2020epochs due to their slower convergence\. All experiments use a fixed learning rate of1×10−51\\times 10^\{\-5\}with the AdamW optimizer\.

Table A1:Hyperparameters for fine\-tuningℳ2\\mathcal\{M\}\_\{2\}for the derivative setting\.

## Appendix BDistribution Visualization on More Models

In this section, we complement the distribution analysis of the main paper with visualizations for additional models and settings\. These plots substantiate our central observation: across all evaluated architectures, the distributions for generated images overlap heavily with those for training members, making the MGI task fundamentally harder than standard membership inference\.

### B\.1Model Direct Training Setting

RAR\.[FigureA1](https://arxiv.org/html/2606.23872#A2.F1)shows the score distributions for PIAR\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]and the PRADA image attribution method\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]on RAR\-XXL\. For both scoring functions, the generated \(“Belonging”\) distribution is substantially closer to the training distribution than the held\-out validation set\. This confirms that the conditional probability discrepancy used by existing MIAs cannot reliably separate members from generated samples in the RAR architecture\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/x7.png)\(a\)The distribution of scores for the membership inference attack PIAR\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]
![Refer to caption](https://arxiv.org/html/2606.23872v1/x8.png)\(b\)Distribution of scores for a IAR\-generated image attribution PRADA\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]\.

Figure A1:Distributions of scores for membership inference and image attribution on RAR\-XXL\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.LlamaGen\.A similar pattern emerges for LlamaGen\-XXL \([FigureA2](https://arxiv.org/html/2606.23872#A2.F2)\)\. Here, the overlap between the generated and training distributions is even more pronounced under the MIA score, with the generated distribution shifted further toward the member region compared to the validation distribution\. The PRADA attribution score provides somewhat better separation, yet a significant fraction of generated samples still falls within the range of training member scores, highlighting the inadequacy of likelihood\-based methods alone for the MGI task\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/x9.png)\(a\)The distribution of scores for the membership inference attack PIAR\[[11](https://arxiv.org/html/2606.23872#bib.bib128)\]
![Refer to caption](https://arxiv.org/html/2606.23872v1/x10.png)\(b\)The distribution of scores for a IAR\-generated image attribution\[[4](https://arxiv.org/html/2606.23872#bib.bib147)\]\.

Figure A2:Distributions of scores for membership inference and image attribution on LlamaGen\-XXL\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\]\.Stable Diffusion 1\.4\.We extend the analysis to diffusion models in[FigureA3](https://arxiv.org/html/2606.23872#A2.F3), which plots the CLiD MIA score\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\]distributions for bothℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}\. Forℳ1\\mathcal\{M\}\_\{1\}, the generated images exhibit score distributions that overlap substantially with training members, consistent with the findings on IARs\. In theℳ2\\mathcal\{M\}\_\{2\}setting, the additional fine\-tuning on generated data introduces even more complex membership signals, resulting in a more entangled set of distributions\. These results motivate the multi\-stage approach of DCB, which leverages complementary autoencoder\-based signals to disentangle these overlapping distributions\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/x11.png)\(a\)The distribution of scores for the state\-of\-the\-art MIA of Diffusion Models, CLiD\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\]\. The evaluated model isℳ1\\mathcal\{M\}\_\{1\}\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x12.png)\(b\)The distribution of scores for the state\-of\-the\-art MIA of Diffusion Models, CLiD\[[34](https://arxiv.org/html/2606.23872#bib.bib110)\]\. The evaluated model isℳ2\\mathcal\{M\}\_\{2\}\.

Figure A3:Distributions of scores for membership inference attack on diffusion models for bothℳ1\\mathcal\{M\}\_\{1\}andℳ2\\mathcal\{M\}\_\{2\}\.The evaluated model is Stable Diffusion 1\.4 \(fine\-tuned on MS\-COCO\)\.
### B\.2Model Derivative Setting

We additionally visualize the score distributions for the model derivative setting\.[FiguresA4](https://arxiv.org/html/2606.23872#A2.F4),[A5](https://arxiv.org/html/2606.23872#A2.F5)and[A6](https://arxiv.org/html/2606.23872#A2.F6)present three complementary views under theℳ2\\mathcal\{M\}\_\{2\}scenario, each shown for both VAR and RAR\-XXL: the latent\-generator MIA score, the autoencoder reconstruction and quantization error, and the cross\-generator probability discrepancy, respectively\.

The MIA score distribution in[FigureA4](https://arxiv.org/html/2606.23872#A2.F4)reveals a complex mixture: generated members \(GMG\_\{M\}\) and generated non\-members \(GNG\_\{N\}\) produce high MIA scores that overlap with or exceed those of natural members\. This confirms that probability\-based MIAs alone cannot distinguish the provenance of generated samples in the derivative setting\.

In contrast, the autoencoder\-based score in[FigureA5](https://arxiv.org/html/2606.23872#A2.F5)provides a clear separation between natural and generated images, as generated images exhibit lower reconstruction and quantization errors\. However, it cannot distinguish among different sources of generated content \(*i\.e*\.,GMG\_\{M\}vs\.GNG\_\{N\}vs\.G′G^\{\\prime\}\)\.

The cross\-generator probability discrepancy in[FigureA6](https://arxiv.org/html/2606.23872#A2.F6)addresses this gap: by comparing the conditional log\-probabilities under𝒢1\{\\mathcal\{G\}\}\_\{1\}and𝒢2\{\\mathcal\{G\}\}\_\{2\}, the feature vectorϕ\(𝐱,𝐜\)\{\\mathbf\{\\phi\}\}\(\{\\mathbf\{x\}\},\{\\mathbf\{c\}\}\)reveals distinct clusters for images generated byℳ1\\mathcal\{M\}\_\{1\}versusℳ2\\mathcal\{M\}\_\{2\}, enabling fine\-grained attribution among generated sources\. Together, these three complementary signals form the basis of the DCB pipeline and motivate its cascaded three\-stage design\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/x13.png)\(a\)Distribution for VAR\-d30\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x14.png)\(b\)Distribution for RAR\-XXL\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.

Figure A4:Distributions of scores for the state\-of\-the\-art MIA on IARs\[[29](https://arxiv.org/html/2606.23872#bib.bib129)\]in the model \(ℳ2\\mathcal\{M\}\_\{2\}\) derivative setting\.With respect to the modelℳ2\\mathcal\{M\}\_\{2\}, we assign the following labels to the datasets:Generatedforℳ2\\mathcal\{M\}\_\{2\}\-generated data,Trainfor the member data used for trainingℳ2\\mathcal\{M\}\_\{2\}\(including pre\-training and finetuning\), andValfor non\-member validation data\. For the modelℳ2\\mathcal\{M\}\_\{2\}, generated members \(GMG\_\{M\}\) and generated non\-members \(GNG\_\{N\}\) yield high MIA scores that overlap with or exceed those of natural members, so probability\-based MIAs alone cannot distinguish the provenance of generated samples\.![Refer to caption](https://arxiv.org/html/2606.23872v1/x15.png)\(a\)Distribution for VAR\-d30\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x16.png)\(b\)Distribution for RAR\-XXL\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.

Figure A5:Distributions of scores for autoencoder\-based score in the model \(ℳ2\\mathcal\{M\}\_\{2\}\) derivative setting\.With respect to the modelℳ2\\mathcal\{M\}\_\{2\}, we assign the following labels to the datasets:Generatedforℳ2\\mathcal\{M\}\_\{2\}\-generated data,Trainfor the member data used for trainingℳ2\\mathcal\{M\}\_\{2\}\(including pre\-training and finetuning\), andValfor non\-member validation data\. The evaluated autoencoder\-based score is defined by[Equation5](https://arxiv.org/html/2606.23872#S5.E5)\. The score cleanly separates natural from generated images, as generated images exhibit lower reconstruction and quantization errors, but it cannot distinguish among the different sources of generated content \(GMG\_\{M\}vsGNG\_\{N\}vsG′G^\{\\prime\}\)\.![Refer to caption](https://arxiv.org/html/2606.23872v1/x17.png)\(a\)Distribution for VAR\-d30\[[24](https://arxiv.org/html/2606.23872#bib.bib70)\]\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/x18.png)\(b\)Distribution for RAR\-XXL\[[31](https://arxiv.org/html/2606.23872#bib.bib146)\]\.

Figure A6:Distributions of scores for the cross\-generator probability discrepancy in the model derivative setting\.With respect to the modelℳ2\\mathcal\{M\}\_\{2\}, we assign the following labels to the datasets:Generatedforℳ2\\mathcal\{M\}\_\{2\}\-generated data,Trainfor the member data used for trainingℳ2\\mathcal\{M\}\_\{2\}\(including pre\-training and finetuning\), andValfor non\-member validation data\. Comparing the conditional log\-probabilities under𝒢1\{\\mathcal\{G\}\}\_\{1\}and𝒢2\{\\mathcal\{G\}\}\_\{2\}reveals distinct clusters for images generated byℳ1\\mathcal\{M\}\_\{1\}versusℳ2\\mathcal\{M\}\_\{2\}, enabling fine\-grained attribution among generated sources\. Together with the MIA and autoencoder signals, these three complementary scoring functions motivate the cascaded design of DCB\.

## Appendix CAdditional Results

### C\.1Model Direct Training Setting

AUC\.[TableA2](https://arxiv.org/html/2606.23872#A3.T2)and[TableA3](https://arxiv.org/html/2606.23872#A3.T3)report the AUC for IARs and DMs, respectively\. DCB achieves an overall AUC of98\.098\.0on IARs and96\.596\.5on DMs, substantially outperforming all baselines\. Notably, the advantage of DCB is most pronounced in theNM/GN\_\{M\}/Gcomparison, where existing MIAs exhibit very limited AUC \(*e\.g*\.,7\.87\.8for PIAR and0\.90\.9for ICAS on RAR\), confirming that likelihood\-based MIAs cannot separate members from generated samples\. DCB achieves100\.0100\.0AUC on both RAR and LlamaGen for this critical comparison, demonstrating perfect separation through its autoencoder\-based filtering stage\. For DMs, PRADA degrades to below5252AUC across both SD 1\.4 and SD 2\.1, while DCB reaches at least99\.999\.9\.

TPR@5%FPR\.[TableA4](https://arxiv.org/html/2606.23872#A3.T4)and[TableA5](https://arxiv.org/html/2606.23872#A3.T5)present the TPR at 5% FPR\. The trends are consistent with the AUC results: DCB achieves an overall TPR@5%FPR of92\.792\.7on IARs and85\.485\.4on DMs, compared to the best baseline scores of71\.771\.7\(PRADA on IARs\) and50\.550\.5\(CLiD/ICAS on DMs\)\. The improvement is especially significant in theNM/GN\_\{M\}/Gcolumn, where DCB achieves100\.0%100\.0\\%TPR@5%FPR on RAR and LlamaGen while all baselines remain below87%87\\%\. For theNM/NNN\_\{M\}/N\_\{N\}comparison, DCB matches the best baseline in each case, as it falls back on the standard MIA score in Stage 2 of the pipeline after filtering out generated samples\.

Table A2:AUC for IARs in the direct training setting\.Table A3:AUC for the DMs the direct training setting\.Table A4:TPR@5%FPR for IARs in the direct training setting\.Table A5:TPR@5%FPR for DMs in the direct training setting\.Table A6:AUC for the model derivative setting\.
### C\.2Model Derivative Setting

AUC\.[TableA6](https://arxiv.org/html/2606.23872#A3.T6)reports the AUC across all pairwise comparisons in the model derivative setting\. DCB achieves the highest overall AUC for every model:99\.599\.5\(VAR\),99\.899\.8\(RAR\),97\.897\.8\(SD 1\.4\), and97\.597\.5\(SD 2\.1\)\. The baselines clearly fail on the challenging comparisons*among generated samples*\(GM/G′G\_\{M\}/G^\{\\prime\}andGN/G′G\_\{N\}/G^\{\\prime\}\), where only DCB’s cross\-generator probability discrepancy \(Stage 3\) provides reliable separation\. For example, on VAR, PRADA achieves only2\.92\.9AUC forGM/G′G\_\{M\}/G^\{\\prime\}, while DCB reaches99\.999\.9\. On the diffusion models, the performance gap is particularly clear in the “Among Generated” columns: CLiD/ICAS achieve16\.416\.4and12\.812\.8AUC forGM/G′G\_\{M\}/G^\{\\prime\}on SD 1\.4 and SD 2\.1, respectively, whereas DCB attains91\.691\.6and90\.690\.6\.

TPR@5%FPR\.[TableA7](https://arxiv.org/html/2606.23872#A3.T7)presents the TPR@5%FPR for the same setting\. The results are consistent with the AUC analysis\. DCB achieves an overall TPR@5%FPR of98\.398\.3\(VAR\),99\.199\.1\(RAR\),91\.391\.3\(SD 1\.4\), and90\.290\.2\(SD 2\.1\), outperforming the strongest baselines by margins ranging from0\.90\.9\(RAR, vs\. ICAS at90\.490\.4\) to25\.625\.6\(VAR, vs\. ICAS at88\.788\.7\) percentage points\. The “Among Generated” comparisons exhibit the largest gaps: forGM/G′G\_\{M\}/G^\{\\prime\}on RAR, PRADA achieves0\.1%0\.1\\%while DCB reaches100\.0%100\.0\\%; forGN/G′G\_\{N\}/G^\{\\prime\}on SD 1\.4, PIAR/ICAS achieve68\.4%68\.4\\%with DCB achieving100\.0%100\.0\\%\. These results confirm that the multi\-stage architecture of DCB is essential for resolving the fine\-grained attribution challenges posed by the model derivative setting\.

Table A7:TPR@5%FPR for the model derivative setting\.Table A8:Robustness of Stage 1 \(ℒQ\\mathcal\{L\}\_\{\\text\{Q\}\}\)\.We show TPR@1%FPR on RAR\.

## Appendix DResults on More Models

We further evaluate our approach and the baselines on the direct training setting for two SoTA diffusion models, REPA\[[32](https://arxiv.org/html/2606.23872#bib.bib157)\]and Lightning DiT\[[28](https://arxiv.org/html/2606.23872#bib.bib158)\]\.[TableA9](https://arxiv.org/html/2606.23872#A4.T9)shows that our approach generalizes effectively to the two SoTA diffusion models, outperforming the baselines\.

Table A9:DCB on State\-of\-the\-art Diffusion Models, REPA and Lightning DiT\.We report the AUC of the direct training setting\.Table A10:Cross\-architecture Setting\.Table A11:Prompt estimation\.Captions from BLIP2/LLaVA replace ground\-truth \(GT\) prompts in[Table4](https://arxiv.org/html/2606.23872#S6.T4)\.Table A12:Performance of Stage 1 on LlamaGen with and without the optional finetuning\.The metric is TPR@1%FPR\.
## Appendix EAdditional Results for Memorized Samples

We provide additional visualizations for the memorization case study discussed in[Section4\.2](https://arxiv.org/html/2606.23872#S4.SS2)of the main paper\.[FigureA7](https://arxiv.org/html/2606.23872#A5.F7)shows a representative example of a memorized training sample from RAR\-XXL alongside its corresponding re\-generated output\. Despite sharing nearly identical visual content—with an SSCD similarity score of0\.8270\.827, well above the0\.70\.7threshold used to identify memorized samples—the two images are not pixel\-identical\. The re\-generated image has passed through the full generation pipeline \(autoregressive token sampling and decoding\), which introduces subtle generation\-specific artifacts\. These artifacts are imperceptible to the human eye but are reliably captured by our autoencoder\-based attribution scoreℒA\\mathcal\{L\}\_\{A\}, as shown in[Figure3\(c\)](https://arxiv.org/html/2606.23872#S4.F3.sf3)of the main paper, where the quantization and reconstruction error distributions for memorized training images and their re\-generated counterparts are well\-separated\.

This example illustrates the most challenging case for data provenance: the generated image is a near\-duplicate of the training sample, yet it was produced through the model’s generative process rather than directly copied\. Standard MIAs, which rely on the latent generator’s probability scores, assign nearly identical scores to both the original and the re\-generated version \([Figure3\(a\)](https://arxiv.org/html/2606.23872#S4.F3.sf1)\), since both are mostly consistent with the learned distribution\. In contrast, DCB’s autoencoder\-based filtering stage detects the generation artifacts introduced by the encode\-decode pipeline, enabling reliable separation even in this extreme memorization regime\. The quantitative performance on all169169identified memorized samples is reported in[Table1](https://arxiv.org/html/2606.23872#S4.T1)of the main paper, where DCB achieves97\.597\.5AUC and93\.5%93\.5\\%TPR@5%FPR, compared to at most61\.861\.8AUC and3\.0%3\.0\\%TPR@5%FPR for the best baseline\.

![Refer to caption](https://arxiv.org/html/2606.23872v1/fig/mem_vis/rar_xxl_mem201141_label996_cropped.png)\(a\)The real training image for the memorized sample\.
![Refer to caption](https://arxiv.org/html/2606.23872v1/fig/mem_vis/rar_xxl_mem201141_label996_cosine30_0.82680.png)\(b\)The re\-generated image for the memorized sample\.

Figure A7:Visualization of the real training sample and re\-generated images for one memorized sample\.The evaluated model RAR\-XXL and the SSCD score is 0\.827\. The ImageNet label for the image pair is 996\.
## Appendix FRobustness

We evaluate our proposed Stage 1 \(as described in[Section5\.3](https://arxiv.org/html/2606.23872#S5.SS3)\) under real\-world web\-pipeline degradations \(JPEG, resize, saturation\) and the strong adaptive adversarial attack that directly optimizes perturbations to maximizeℒQ\\mathcal\{L\}\_\{\\text\{Q\}\}\. The results are shown in[TableA8](https://arxiv.org/html/2606.23872#A3.T8)\. Without any augmentation, Stage 1 already retains≥88\.5%\\geq 88\.5\\%TPR@1%FPR under all natural transforms\. Optionally, we apply augmentations to the finetuning process of the encoder, inspired by\[[36](https://arxiv.org/html/2606.23872#bib.bib134)\]and\[[10](https://arxiv.org/html/2606.23872#bib.bib156)\]\. The augmented fine\-tuning further boosts robustness to97\.4%97\.4\\%even under the adaptive adversarial attack\.

## Appendix GCross\-architecture Generalization

In the model derivative setting, we mainly consider theidentical\-architecturesetting whereℳ2\\mathcal\{M\}\_\{2\}is trained on the data generated byℳ1\\mathcal\{M\}\_\{1\}with the same architecture asℳ2\\mathcal\{M\}\_\{2\}\. In this section, we further evaluate across\-architecturesetting, whereℳ2\\mathcal\{M\}\_\{2\}is trained on images generated by a model with different architecture\. We note that a cross\-architecture setting is an*easier*setting for MGI, not more challenging\. Ifℳ2\\mathcal\{M\}\_\{2\}is trained on images generated by another model architecture, the distinct autoencoder architectures*enhance*Stage 1 separation ofGMG\_\{M\}/G′G^\{\\prime\}\(rather than collapsing it\), andGMG\_\{M\}/GNG\_\{N\}reduces to standard MIA\. In contrast, the identical\-architecture setting is a more challenging setting, becauseGMG\_\{M\},GNG\_\{N\}, andG′G^\{\\prime\}are all from the same autoencoder and therefore require Stage 3\. Therefore, we choose the the more challenging identical\-architecture setting for evaluation in our main content\.[TableA10](https://arxiv.org/html/2606.23872#A4.T10)evaluates the cross\-architecture setting on theSD 1\.4→\\toSD 2\.1case and two heterogeneous DM\-to\-IAR pairs\. The results show that DCB attains≥99%\\geq 99\\%AUC onGM/G′G\_\{M\}/G^\{\\prime\}for all settings\.

## Appendix HPrompt Estimation

We note that ground\-truth prompts can be absent in certain applications\. In such cases, we use BLIP2/LLaVA to generate prompts for a given image\.[TableA11](https://arxiv.org/html/2606.23872#A4.T11)compares the performance of our approach using groundtruth \(GT\) and BLIP2/LLaVA\-generated captions\. The results show that DCB still achieves high performance\.

## Appendix IModel Access

Our approaches perform the best with the white\-box access, where the optional finetuning is enabled\. We also evaluate a gray\-box setting, where only the outputs of the autoencoder and latent generator can be observed\. In the gray\-box setting, the optional finetuning is not possible and our proposed approach fully operates with the original, un\-finetuned encoder\.[TableA12](https://arxiv.org/html/2606.23872#A4.T12)shows that our method achieves high performance on LlamaGen without the optional finetuning\. Moreover, we never use the finetuned encoder for the Diffusion Models\. Our method can operate in a gray\-box setting for many models, requiring only loss values and generative model outputs, matching the access assumptions of SOTA MIAs \(*e\.g*\., PIAR, CLiD, ICAS\)

## Appendix JHyperparameter Analysis

We evaluate two hyperparameters in the KDE test in Stage 1: density thresholdα\\alphaand bandwidth multiplierσ\\sigma\.[TableA13](https://arxiv.org/html/2606.23872#A10.T13)shows that our approach is not sensitive to these two hyperparameters\.

Table A13:Stage\-1 KDE sensitivity\.\(a\)α\\alphasweep \(σ=0\.03\\sigma\{=\}0\.03\)
\(b\)σ\\sigmasweep \(α=0\.05\\alpha\{=\}0\.05\)

Table A14:Extension of[Table2](https://arxiv.org/html/2606.23872#S6.T2)to 5k samples\.Table A15:Per\-image cost \(sec\) andDCB\-vs\-MIA cost ratio\.
## Appendix KSample Size Evaluation

We extend our experiments from 1K samples \([Table2](https://arxiv.org/html/2606.23872#S6.T2)\) to 5K samples on RAR\-XXL and observe same trends, as we show in[TableA14](https://arxiv.org/html/2606.23872#A10.T14)\.

## Appendix LComputational Cost

DCBtargets the offline auditing regime shared by all SoTA MIAs \(PIAR, CLiD, PRADA\), not real\-time use\. As shown in[TableA15](https://arxiv.org/html/2606.23872#A10.T15),DCBadds only0\.16×0\.16\\times–0\.46×0\.46\\times\(M1M\_\{1\}\) and0\.66×0\.66\\times–0\.71×0\.71\\times\(M2M\_\{2\}\) on top of a single MIA inference\. Additionally,DCBuses forward passes only \(no backpropagation\) and can be parallelized across images, so throughput scales linearly with GPUs and million\-scale audits are practical\.
MGI: Member vs Generated Inference

Similar Articles

MIND: Monge Inception Distance for Generative Models Evaluation

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

Membership Inference Attacks on Discrete Diffusion Language Models

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Submit Feedback

Similar Articles

MIND: Monge Inception Distance for Generative Models Evaluation
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
Membership Inference Attacks on Discrete Diffusion Language Models
BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos