Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings
Summary
This paper presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for LLMs that improves robustness against paraphrasing and translation by leveraging contextual and token-level embeddings. Experimental results show improved detection after paraphrasing and translation compared to prior methods.
View Cached Full Text
Cached at: 07/01/26, 05:34 AM
# Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings Source: [https://arxiv.org/html/2606.31602](https://arxiv.org/html/2606.31602) Cezary PilaszewiczGerhard Wunder Department of Mathematics and Computer Science Freie Universität Berlin Berlin, Germany Correspondence:[jonas\.schaefer2@fu\-berlin\.de](https://arxiv.org/html/2606.31602v1/mailto:[email protected]) ###### Abstract This work presents Dual\-Embedding Watermarking \(DEW\), a semantic watermarking scheme for large language models \(LLMs\) that leverages contextual and token\-level embeddings to enhance robustness against paraphrasing and translation\.DEWutilizes a signal\-processing methodology, applying algebraic vector\-space operations totokenand context embeddings to derive a watermark signal that degrades gracefully under semantic shifts\. The method obfuscates the watermark by projecting embedding vectors through pseudo\-random matrices seeded with a secret key\. Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking ofDEW\. Experimental results across multiple LLMs indicate thatDEWimproves post\-paraphrase detection while maintaining competitive text quality, and remains detectable after translation, even when prior semantic watermarks degrade significantly\. These findings positionDEWas a practical and robust solution for safeguarding LLM\-generated text and addressing critical issues in responsible AI deployment\. Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings Jonas Schäfer and Cezary Pilaszewicz and Gerhard WunderDepartment of Mathematics and Computer ScienceFreie Universität BerlinBerlin, GermanyCorrespondence:[jonas\.schaefer2@fu\-berlin\.de](https://arxiv.org/html/2606.31602v1/mailto:[email protected]) ## 1Introduction Large language models \(LLMs\) have rapidly emerged as powerful tools capable of generating text with human\-like fluency and finding applications in creative writing, programming, and conversational agents\. However, as these models advance, distinguishing LLM\-generated from human\-authored text becomes increasingly challenging, with profound implications for trust, misinformation, and content attribution\. Inference\-time watermarking has recently gained significant attention in both research and policy discussions\. These methods introduce a hidden statistical signal into the text during generation, which a corresponding algorithm can detect\. Surface\-level watermarks dynamically modify the generation process according to a predefined scheme, such as only sampling from a specific partition of the vocabularyKirchenbaueret al\.\([2023a](https://arxiv.org/html/2606.31602#bib.bib18)\)or adding keyed exponential noise to logitsAaronson and Kirchner \([2022](https://arxiv.org/html/2606.31602#bib.bib1)\)\. Some watermarks have been mathematically proven not to alter text statistics in expectation or have been shown not to cause any quality degradation perceptible to humansDathathriet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib6)\)\. Recent work has demonstrated that incorporating text semantics into the watermark signal computation improves resiliency to semantically invariant text modifications, typically at the cost of decreased text quality and/or increased computational overheadLiuet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib22)\); Houet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib11)\)\. Despite recent advances, paraphrasing and translation remain major challenges in detecting LLM\-generated text\. Furthermore, many watermarking schemes introduce patterns detectable by third parties, making them susceptible to reverse\-engineering attacksJovanovićet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib17)\)\. These patterns enable adversaries not only to identify the presence of a watermark but also to remove it systematically\. LLM\-generated text is commonly rewritten to meet application\-specific demands or stylistic preferences\. As a result, text watermarking faces persistent challenges, including threats to content authenticity, diminished trust in AI systems, regulatory ambiguities, and difficulties in legal enforcement\. Therefore, reliable attribution of LLM\-generated content is essential while minimizing the misclassification of human\-authored texts\. In this work, we demonstrate that the robustness of semantic watermarks can be substantially enhanced by incorporating not only contextual semantics but also the semantics of candidate tokens into the watermark signal computation\. Unlike prior methodsHouet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib11)\), our approach achieves this with low computational overhead and ensures graceful degradation of the watermark signal under semantic shifts\. We present*Dual\-Embedding Watermarking*\(DEW\), which combines two semantic embedding models to compute per\-token watermark biases based on the cosine similarity between token and context embeddings\. This procedure adds zero\-centered, pseudo\-random noise to the LLM\-computed logits\. During detection, watermarked tokens exhibit significantly higher signal scores than unwatermarked tokens in expectation\. Crucially, since the variation in watermark signals depends on the differences in token embedding vectors, semantically similar tokens receive similar signals\. This property substantially improves translation robustness, yet it has been largely overlooked in prior work, aside from a few exceptionsHeet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib10)\); Houet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib11)\)\. Our results demonstrate significantly improved robustness against LLM\-assisted translation, along with modest gains in paraphrase robustness, while maintaining text quality competitive with the most robust baselines\. Notably, even after translation from English into German,DEWachieves a true positive rate \(TPR\) of up to 65% at a 1% false positive rate \(FPR\)\. At the same time,DEWincurs significantly lower computational overhead during text generation and watermark detection than most other semantic watermarks and remains robust to simple reverse\-engineering attacks\. The remainder of this paper is organized as follows: Section[2](https://arxiv.org/html/2606.31602#S2)reviews related work, Section[3](https://arxiv.org/html/2606.31602#S3)details the methodology, Section[4](https://arxiv.org/html/2606.31602#S4)presents experimental results, Section[5](https://arxiv.org/html/2606.31602#S5)concludes with implications, and Section[6](https://arxiv.org/html/2606.31602#S6)outlines limitations and future directions\. ## 2Related Work Text watermarking is a special case of linguistic steganography that embeds a hidden signal in a passage of text\. LLM watermarks are commonly evaluated along three core dimensions:detectability, requiring verifiability at low false\-positive rates;secrecy, requiring no easily detectable artifacts; androbustness, requiring evasion to substantially modify the watermarked text, especially its semanticsKuditipudiet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib21)\)\. Deployment also requires soundness: independently generated text, including unusual or non\-native writing, should rarely be falsely flagged\. These goals are inherently in tension: stronger detectability can reduce secrecy or robustness, while stronger secrecy, including distortion\-freeness, can make detection harder\. Effective LLM watermarks should also be agnostic to the generating model and prompt, computationally efficient at generation and detection time, and compatible with standard autoregressive decoding\. Before LLMs, text watermarking largely relied on rule\-based transformations such as synonym substitutionTopkaraet al\.\([2006](https://arxiv.org/html/2606.31602#bib.bib31)\)and paraphrasingAtallahet al\.\([2002](https://arxiv.org/html/2606.31602#bib.bib3)\)\. Because such methods use fixed substitutions, they systematically alter text statistics, making the watermark easier to detect and removeTanget al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib30)\); Ziegleret al\.\([2019](https://arxiv.org/html/2606.31602#bib.bib38)\)\. Recent LLM watermarking research instead focuses on inference\-time schemes, which embed the signal directly during generation by modifying the model’s token\-selection process, but typically require access to model logits\. Consequently, they cannot be deployed for black\-box APIs unless the provider controls insertion, and they can be disabled in locally hosted models\. ### 2\.1Surface\-level Watermarks Most LLM watermarks operate at the surface level, injecting the signal based on token identities or exact token contexts without explicitly modeling semantics\. These methods are simple and inexpensive, but exact context dependence makes their signals vulnerable to local edits\.Kirchenbaueret al\.\([2023a](https://arxiv.org/html/2606.31602#bib.bib18),[b](https://arxiv.org/html/2606.31602#bib.bib19)\)propose a scheme, referred to here asKGW, that hashes the precedingkktokens to pseudo\-randomly partition the vocabulary into green and red lists and then boosts green\-list logits\. Parallel unpublished work byAaronson and Kirchner \([2022](https://arxiv.org/html/2606.31602#bib.bib1)\), referred to asEXP, similarly hashes the previouskktokens but samples using keyed exponential noise and Perturb\-and\-MAP decodingPapandreou and Yuille \([2011](https://arxiv.org/html/2606.31602#bib.bib26)\)\. Both schemes bias the distribution toward subsets ofkk\-gramsKuditipudiet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib21)\); Wuet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib35)\), yielding a trade\-off: largerkkimproves secrecy by reducing repeated contexts, whereas smallerkkimproves robustness by making local edits less disruptive\. In the limiting casek=0k=0,KGWbecomesUnigramZhaoet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib37)\), which is highly robust but vulnerable to reverse\-engineering attacksJovanovićet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib17)\)\.Dathathriet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib6)\)instead proposeSynthID, which uses Tournament Sampling to optimize a secret statistical watermark score and also provides a distortion\-free mode with reduced detectability\. Distortion\-free and distribution\-preserving watermarks aim to improve secrecy by avoiding changes to the output distribution\. In this sense,EXPis distortion\-free only whenkkis large enough to avoid repeated contexts\.UnbiasedWMHuet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib13)\)uses inverse\-transform sampling and permutation\-based reweighting to integrate a watermark without altering token probabilities, but its detection requires token logits and ideally an approximate reconstruction of the prompt, limiting agnosticismWuet al\.\([2024b](https://arxiv.org/html/2606.31602#bib.bib36)\)\.DiPmarkWuet al\.\([2024b](https://arxiv.org/html/2606.31602#bib.bib36)\)provides an agnostic alternative by adapting reweighting to increase the total probability mass of green\-list tokens rather than uniformly boosting every green\-list token\. BothUnbiasedWMandDiPmarkare provably distortion\-free in the absence of watermark key collisionsWuet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib35)\)\. ### 2\.2Semantic Watermarks Semantic watermarks are motivated by the limited robustness of surface\-level schemes against meaning\-preserving transformations such as paraphrasing and translation\. Rather than relying only on token hashes, they condition the watermark signal, its parameters, or its training objective on semantic representations\. Some semantic watermarks, includingDEW, also make each candidate token’s signal depend on that token’s semantics\. This distinction is important because context\-level semantics can stabilize the signal under paraphrasing, whereas candidate\-token semantics make synonym substitutions and translations more likely to preserve token\-level evidence\. TSHuoet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib15)\)extends the green\-list paradigm by learning token\-specific vocabulary split ratios and green\-list logit biases from the preceding\-token embedding\. Detection remainsKGW\-like via a one\-sidedzz\-test adjusted for varying split ratios\. WhileTSimproves detectability and semantic coherence over fixed\-parameter green\-list schemes, it does not directly assign similar watermark signals to semantically related candidate tokens\. ATWLiu and Bu \([2024](https://arxiv.org/html/2606.31602#bib.bib2)\)combines entropy\-gated insertion with semantic logit scaling\. It leaves low\-entropy decoding steps unmodified and, at selected high\-entropy steps, maps embeddings of the preceding text to a logits\-scaling vector\. Detection approximates a likelihood\-ratio test over the tokens selected by the same entropy criterion\. Compared withDEW,ATWconditions the signal on context semantics but does not explicitly couple candidate\-token scores to candidate\-token semantics\. SIRLiuet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib22)\)uses an auxiliary LLM to embed the preceding context and transforms these embeddings into watermark logits with a neural network trained to preserve semantic similarity while maintaining diversity and unbiasedness\. This makes the watermark more stable under semantically invariant edits, but the signal is primarily context\-conditioned and requires an auxiliary learned mapping in addition to the host LLM\. X\-SIRHeet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib10)\)extendsSIRby clustering semantically similar tokens and assigning a shared watermark bias within each cluster, making it the closest prior work toDEWbecause it incorporates candidate\-token semantics during signal computation\. SemStampHouet al\.\([2024a](https://arxiv.org/html/2606.31602#bib.bib11)\)operates at sentence granularity: it embeds each generated sentence and uses rejection sampling to output only sentences whose embedding falls into an allowed locality\-sensitive hashing \(LSH\) partition\. This improves paraphrastic robustness but increases generation time by 5\- to 20\-fold\. Its follow\-up,kk\-SemStampHouet al\.\([2024b](https://arxiv.org/html/2606.31602#bib.bib12)\), replaces LSH withkk\-means clustering to reduce rejection rates and improve robustness, but requires specifying the generation domain at initialization\. Because these sentence\-level rejection\-sampling schemes require repeated sentence generation and, forkk\-SemStamp, domain\-specific initialization, they are not directly comparable within our token\-level logit\-bias evaluation protocol; we therefore do not evaluateSemStamporkk\-SemStampin this study\. ## 3Methodology LLMs have a vocabulary𝒱\\mathcal\{V\}containing words or word fragments \(*tokens*\)\. Given an input sequence𝐱=\(x1,…,xt−1\)\\mathbf\{x\}=\(x\_\{1\},\\ldots,x\_\{t\-1\}\), the model computes a probability distribution over𝒱\\mathcal\{V\}by producing a set of*logits*ℓ\\ell, where each logit represents the unnormalized log\-probability of the corresponding token\. Each tokenxtx\_\{t\}is selected by sampling from this distribution or using a decoding method such as beam search\. This process repeats until the LLM generates an end\-of\-sequence token or reaches a maximum text length\. Inference\-time watermarking schemes modify probability distributions by either manipulating the sampling process or by directly adjusting the distribution, as in this work, where watermark biases are added to the candidate token logits during text generation \(*watermark insertion*, Section[3\.1](https://arxiv.org/html/2606.31602#S3.SS1)\)\. To introduce secrecy, this process generally employs a pseudo\-random number generator \(PRNG\) that modifies the signal using a secret key known only to the model provider\. Most schemes also require this key to determine whether a candidate text contains the watermark, a procedure known as*watermark detection*\(Section[3\.2](https://arxiv.org/html/2606.31602#S3.SS2)\)\. To improve text diversity and, in turn, secrecy, the embedded signal is typically made dependent on a sliding window of directly preceding tokens \(the*watermark context*\) by hashing them along with the secret key\. However, due to the cryptographic nature of the hash function, even minor changes in the context yield statistically independent signals\. For this reason, the robustness of such schemes decreases with larger watermark context widths, although text diversity and watermark secrecy improve\. Semantic watermarks enhance robustness against semantically invariant modifications, such as paraphrasing and translation\. These schemes use a numeric representation of the context semantics to assign the same vocabulary partitioning to semantically similar contexts\. This representation is commonly obtained through*embedding models*, which compute vector representations of token sequences\. These models are trained, for example, via contrastive learning, to map semantically similar texts to nearby points in the embedding space\. Semantic watermarking leverages this property by making the watermark signal contingent on the embedding vector\. Although semantic watermarks offer improved robustness to semantically invariant changes in the watermark context, most schemes do not consider inter\-token semantic similarity when calculating the bias distributions\. For this reason, substituting a token with a synonym has a high chance of removing the signal embedded in that token\.DEWcomputes separate semantic embeddings for the context and for each candidate token to assign similar biases to tokens with close embedding vectors\. Additionally, the signal carried by each token smoothly degrades with semantic shifts in either the context or the token itself, further improving robustness\. Secret KeyKKContextCCContextEmbedding ModelContext Embedding𝐞𝐂\\mathbf\{e\_\{C\}\}Pseudo\-RandomNumber Generator\(PRNG\)Pseudo\-RandomMatrix𝐑𝐂\\mathbf\{R\_\{C\}\}Context Projection𝐩𝐂=𝐞𝐂𝐑𝐂\\mathbf\{p\_\{C\}\}=\\mathbf\{e\_\{C\}\}\\mathbf\{R\_\{C\}\}LargeLanguage Model\(LLM\)Pseudo\-RandomMatrix𝐑𝐓\\mathbf\{R\_\{T\}\}Token Projections𝐏𝐓=𝐄𝐓𝐑𝐓\\mathbf\{P\_\{T\}\}=\\mathbf\{E\_\{T\}\}\\mathbf\{R\_\{T\}\}Top\-mmLogit Scoresℓ\\mathbf\{\\ell\}Top\-mmToken IDsTokenEmbedding ModelToken Embeddings𝐄𝐓\\mathbf\{E\_\{T\}\}Logit Bias Vector𝐛=λ⋅tanh\(γ⋅𝐏𝐓𝐩𝐂\)\\mathbf\{b\}=\\lambda\\cdot\\tanh\(\\gamma\\cdot\\mathbf\{P\_\{T\}\}\\mathbf\{p\_\{C\}\}\)Watermarked Logitsℓ\+𝐛\\mathbf\{\\ell\}\+\\mathbf\{b\} Figure 1:An illustration of theDEWinsertion procedure for a single generation step\. Previously generated tokens \(CC\) are jointly embedded, while the top\-mmcandidate token embeddings are computed separately\. All embeddings are projected for obfuscation, and the dot product of the projections is added to the original logits as token\-specific watermark biases\. We sample from the updated logits\. Inputs are highlighted inblue, and the output watermarked logits inred\. For conciseness, we omitted normalization, whitening, orthonormalization, and standardization from the diagram\.### 3\.1Watermark Insertion #### 3\.1\.1Setup In addition to the LLM,DEWincorporates two embedding models\. The token embedding modelMTM\_\{T\}maps individual tokens todTd\_\{T\}\-dimensional vectors\. Similarly, the context embedding modelMCM\_\{C\}maps token sequences of arbitrary length todCd\_\{C\}\-dimensional vectors\. To initialize the algorithm, a secret keyKKis employed to seed a cryptographically secure PRNG\. For the sake of simplicity and efficiency in our experiments, we opted for the default non\-secure PyTorch Philox PRNG\. This generator randomly samples from the standard normal distribution to produce two matrices𝐑T∈ℝdT×n\\mathbf\{R\}\_\{T\}\\in\\mathbb\{R\}^\{d\_\{T\}\\times n\}and𝐑C∈ℝdC×n\\mathbf\{R\}\_\{C\}\\in\\mathbb\{R\}^\{d\_\{C\}\\times n\}\. Through random projections, these matrices obfuscate the embedding vectors while preserving distances\. We further introduce the valuenn, which we call the*projection dimensionality*\. While the embedding models determinedTd\_\{T\}anddCd\_\{C\},nnis a tunable hyperparameter controlling the dimension of the random\-projection space\. We conservatively setn=max\(dT,dC\)n=\\max\(d\_\{T\},d\_\{C\}\)in this work\. However, by the Johnson\-Lindenstrauss lemmaJohnson and Lindenstrauss \([1984](https://arxiv.org/html/2606.31602#bib.bib39)\), one can often choose a significantly smallernnwhile approximately preserving distances\. Furthermore, in Appendix[C](https://arxiv.org/html/2606.31602#A3), we propose an optional block\-wise orthogonal construction ofRTR\_\{T\}andRCR\_\{C\}that is*guaranteed*to preserve angles between embedding vectors while still obfuscating them through pseudo\-random rotations\. #### 3\.1\.2Semantic Extraction At each generation step, the LLM computes the logitsℓ∈ℝ\|𝒱\|\\mathbf\{\\ell\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\}as usual\. We useMTM\_\{T\}to embed all subsequent candidate tokens\. In practice, to reduce computational overhead, it is often sufficient to consider only the topm∈ℕm\\in\\mathbb\{N\}tokens with the highest scores inℓ\\mathbf\{\\ell\}, yielding an embedding matrix𝐄T∈ℝm×dT\\mathbf\{E\}\_\{T\}\\in\\mathbb\{R\}^\{m\\times d\_\{T\}\}\. Another option is to apply nucleus sampling, which dynamically selects the smallest set of tokens whose cumulative probability exceeds a specified threshold\. Each row of𝐄T\\mathbf\{E\}\_\{T\}is an embedding vector inℝdT\\mathbb\{R\}^\{d\_\{T\}\}associated with one of themmhighest\-scoring candidate tokens\. Optionally, a whitening transformation can be applied to the token embeddings to ensure isotropy \(uniformity in all directions\) in the embedding space\. For various applications, whitening generally makes embedding similarity metrics more meaningful and consistent across dimensionsHuanget al\.\([2021](https://arxiv.org/html/2606.31602#bib.bib14)\); Dieraet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib7)\)\. Since sequence embeddings are typically derived by pooling individual token embeddings, it is also feasible to apply whitening before pooling\. However, we do not apply whitening during context embedding computation in this study\. Algorithm 1DEWWatermark Insertion \(Single Step\)0:LLM logits ℓ∈ℝ\|𝒱\|\\mathbf\{\\ell\}\\in\\mathbb\{R\}^\{\|\\mathcal\{V\}\|\}, watermark context 𝐜=\(xt−k,…,xt−1\)\\mathbf\{c\}=\(x\_\{t\-k\},\\ldots,x\_\{t\-1\}\), token embedding model MTM\_\{T\}, context embedding model MCM\_\{C\}, secret key KK, top\- mmcandidate count, watermark strength λ\\lambda, saturation factor γ\\gamma, projection dimensionality nn\. 0:Watermarked logits ℓ′\\mathbf\{\\ell^\{\\prime\}\} 1:Use KKto seed a PRNG\(only once per session; can be cached\) 2:Regenerate \(or recall\) 𝐑T∈ℝdT×n\\mathbf\{R\}\_\{T\}\\in\\mathbb\{R\}^\{d\_\{T\}\\times n\}and 𝐑C∈ℝdC×n\\mathbf\{R\}\_\{C\}\\in\\mathbb\{R\}^\{d\_\{C\}\\times n\} 3:Compute projected context embedding: 4: 𝐞C←MC\(𝐜\)∈ℝdC\\mathbf\{e\}\_\{C\}\\leftarrow M\_\{C\}\(\\mathbf\{c\}\)\\in\\mathbb\{R\}^\{d\_\{C\}\} 5:Normalize 𝐞C\\mathbf\{e\}\_\{C\} 6: 𝐩C←normalize\(𝐞C𝐑C\)∈ℝn\\mathbf\{p\}\_\{C\}\\leftarrow\\operatorname\{normalize\}\(\\mathbf\{e\}\_\{C\}\\,\\mathbf\{R\}\_\{C\}\)\\in\\mathbb\{R\}^\{n\} 7:Compute \(or recall\) projected token embeddings: 8:Let 𝒯⊆𝒱\\mathcal\{T\}\\subseteq\\mathcal\{V\}be the set of top\- mmtokens from ℓ\\mathbf\{\\ell\} 9: 𝐄T←MT\(𝒯\)∈ℝm×dT\\mathbf\{E\}\_\{T\}\\leftarrow M\_\{T\}\(\\mathcal\{T\}\)\\in\\mathbb\{R\}^\{m\\times d\_\{T\}\} 10:Optional: Apply whitening to 𝐄T\\mathbf\{E\}\_\{T\} 11:Normalize rows of 𝐄T\\mathbf\{E\}\_\{T\} 12: 𝐏T←row\_normalize\(𝐄T𝐑T\)∈ℝm×n\\mathbf\{P\}\_\{T\}\\leftarrow\\operatorname\{row\\\_normalize\}\(\\mathbf\{E\}\_\{T\}\\,\\mathbf\{R\}\_\{T\}\)\\in\\mathbb\{R\}^\{m\\times n\} 13:Compute biases and add them to logits: 14: 𝐛←λ⋅tanh\(γn⋅𝐏T𝐩C\)∈ℝm\\mathbf\{b\}\\leftarrow\\lambda\\cdot\\tanh\\\!\\Bigl\(\\gamma\\sqrt\{n\}\\cdot\\mathbf\{P\}\_\{T\}\\mathbf\{p\}\_\{C\}\\Bigr\)\\in\\mathbb\{R\}^\{m\} 15:Insert 𝐛\\mathbf\{b\}into the corresponding mmpositions of ℓ\\mathbf\{\\ell\}: ℓ′←ℓ\+𝐛\\mathbf\{\\ell^\{\\prime\}\}\\leftarrow\\mathbf\{\\ell\}\+\\mathbf\{b\} 16:output ℓ′\\mathbf\{\\ell^\{\\prime\}\}\(watermarked logits for the next token\) #### 3\.1\.3Obfuscation Next, we normalize the rows of𝐄T\\mathbf\{E\}\_\{T\}and multiply the result by𝐑T\\mathbf\{R\}\_\{T\}, applying a random linear transformation to each embedding vector for obfuscation\. Since the token and context embedding models are fixed \(and potentially public\), obfuscating the embeddings is essential to enable secrecy\. We achieve this through the secret linear transformations𝐑T\\mathbf\{R\}\_\{T\}and𝐑C\\mathbf\{R\}\_\{C\}\. The same process is applied to the watermark context \(for example, thewwpreceding tokens\), yielding an embedding vector𝐞C∈ℝdC\\mathbf\{e\}\_\{C\}\\in\\mathbb\{R\}^\{d\_\{C\}\}and projection vector𝐩C=𝐞C𝐑C∈ℝn\\mathbf\{p\}\_\{C\}=\\mathbf\{e\}\_\{C\}\\mathbf\{R\}\_\{C\}\\in\\mathbb\{R\}^\{n\}\. Notably, all token embeddings and their projections can be precomputed offline\. #### 3\.1\.4Bias Computation We then calculate the logit bias vector𝐛\\mathbf\{b\}by taking the dot product of the context projection vector with each projected token embedding vector\. Since both vectors are normalized, their dot product equals the cosine similarity, which ranges from−1\-1to11and quantifies the cosine of the angle between them\. This value reflects the degree of alignment, with11indicating perfect alignment,−1\-1perfect opposition, and0orthogonality\. Since𝐄T\\mathbf\{E\}\_\{T\}and𝐞C\\mathbf\{e\}\_\{C\}may originate from different models and are independently obfuscated through projection, the cosine similarity lacks a direct interpretive meaning\. Nevertheless, it provides a useful keyed semantic alignment signal: small changes in token or context embeddings induce controlled changes in the projected cosine score, so semantically similar continuations tend to receive similar biases\. Under an isotropic spherical null model, this score is symmetric around0with a known Beta\-type distribution \(Appendix[A](https://arxiv.org/html/2606.31602#A1)\), so text generated independently ofKKattains an expected score of zero\. We derive the exact null distribution of the alignment score𝐩T⊤𝐩C\\mathbf\{p\}\_\{T\}^\{\\top\}\\mathbf\{p\}\_\{C\}\(Beta\-type\) and its high\-dimensional Gaussian approximation, and use it to motivate then\\sqrt\{n\}scaling and false\-positive calibration\. The variance of the dot product of two random vectors uniformly distributed on the unit sphere depends on the dimension of said vectors\. In our idealized null model, the dot product of a spherical vector with any fixed unit vector has variance1/n1/n\. Therefore, multiplying the dot products byn\\sqrt\{n\}yields an approximately unit\-variance baseline score\. In practice, the spherical model is an analytic baseline rather than an exact description of natural\-language token statistics\. We therefore use it to motivate then\\sqrt\{n\}scaling and complement it with empirical thresholding when reporting fixed\-FPR detection results\. Finally, the dot products are passed through thetanh\\tanhfunction to compute the bias vector𝐛\\mathbf\{b\}: 𝐛=λ⋅tanh\(γn⋅𝐏T𝐩c\)∈ℝm\\mathbf\{b\}=\\lambda\\cdot\\tanh\(\\gamma\\sqrt\{n\}\\cdot\\mathbf\{P\}\_\{T\}\\mathbf\{p\}\_\{c\}\{\}\)\\in\\mathbb\{R\}^\{m\}\(1\)Here,λ\\lambdais a hyperparameter to control the watermark signal strength\. Moreover,tanh\(⋅\)\\tanh\(\\cdot\)denotes the element\-wise application of the hyperbolic tangent function\. Using a non\-linear activation function such astanh\\tanhfacilitates smooth clipping of the pre\-scaling biases, mitigating the impact of outliers on text quality\. By scaling the argument oftanh\\tanhby a hyperparameterγ∈ℝ∗\+\\gamma\\in\\mathbb\{R\}^\{\+\}\_\{\*\}, the clipping level can be adjusted: larger scaling factors accentuate saturation, while smaller factors preserve a broader dynamic range of biases\. Higher saturation results in more tokens receiving extreme bias values, approaching−λ\-\\lambdaorλ\\lambda\. This behavior is reminiscent of the green/red listKGWwatermarkKirchenbaueret al\.\([2023a](https://arxiv.org/html/2606.31602#bib.bib18)\), thoughKGWdoes not involve assigning negative biases to logits\. Finally, we sample the next token fromℓ\+𝐛\\mathbf\{\\ell\}\+\\mathbf\{b\}\. We describe the watermark insertion procedure for one generation step in Algorithm[1](https://arxiv.org/html/2606.31602#alg1)and illustrate it in Figure[1](https://arxiv.org/html/2606.31602#S3.F1)\. ### 3\.2Watermark Detection The detection procedure mirrors the insertion process\. It iterates over a given candidate text document token by token and sums the biases embedded in each token to obtain a document\-level watermark score\. This score can be thresholded for binary classification, with the threshold tunable to control the FPR\. Specifically, for each observed tokenxtx\_\{t\}, we first compute its embeddingMT\(xt\)∈ℝdTM\_\{T\}\(x\_\{t\}\)\\in\\mathbb\{R\}^\{d\_\{T\}\}, apply a whitening transformation, normalize, and then project the resulting vector via multiplication with𝐑T\\mathbf\{R\}\_\{T\}to obtain𝐩𝐓\\mathbf\{p\_\{T\}\}\. For the context𝐜=\(xt−w,…,xt−1\)\\mathbf\{c\}=\(x\_\{t\-w\},\\ldots,x\_\{t\-1\}\), we compute its embedding vector𝐞C=MC\(𝐜\)\\mathbf\{e\}\_\{C\}=M\_\{C\}\(\\mathbf\{c\}\), normalize it, and project it via𝐑C\\mathbf\{R\}\_\{C\}to obtain𝐩𝐂\\mathbf\{p\_\{C\}\}\. Finally, we obtain the token bias score by computing𝐛=λ⋅tanh\(γn⋅𝐩T𝐩c\)∈ℝ\\mathbf\{b\}=\\lambda\\cdot\\tanh\(\\gamma\\sqrt\{n\}\\cdot\\mathbf\{p\}\_\{T\}\\mathbf\{p\}\_\{c\}\)\\in\\mathbb\{R\}, which only differs from Equation[1](https://arxiv.org/html/2606.31602#S3.E1)in𝐩T\\mathbf\{p\}\_\{T\}, as we now only compute the score of the observed token, instead ofmmcandidate tokens\. We describe the watermark detection procedure for one token in Algorithm[2](https://arxiv.org/html/2606.31602#alg2)\(Appendix[D](https://arxiv.org/html/2606.31602#A4)\)\. The detection procedure can be formalized as a statistical hypothesis test \(Appendix[A\.3](https://arxiv.org/html/2606.31602#A1.SS3)\) to control FPRs rigorously and improve interpretability\. The resulting empirical score distributions match the Gaussian baseline derived in Appendix[A\.2](https://arxiv.org/html/2606.31602#A1.SS2)\. For comparability, we report TPRs at fixed FPRs in Section[4](https://arxiv.org/html/2606.31602#S4)using the standard empirical thresholding procedure from MarkLLMPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\. ## 4Experiments ### 4\.1Language Models and Hyperparameters We use Llama\-3\.2\-3BDubeyet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib9)\)for all main experiments and additionally evaluate Gemma\-7BMesnardet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib41)\)in Appendix[B\.6](https://arxiv.org/html/2606.31602#A2.SS6)\. We generate text via multinomial sampling\. To enhance text diversity, we apply a four\-gram blocking constraint\. This ensures that no four\-token sequence that has already been generated can be repeated\. AsDEW’s semantic context embedding modelMCM\_\{C\}, we choose paraphrase\-multilingual\-mpnet\-base\-v2Reimers and Gurevych \([2019](https://arxiv.org/html/2606.31602#bib.bib28)\)\(dC=768d\_\{C\}=768\) due to its multilingual paraphrase robustness\. We obtain the token embeddings from the word embedding layer of the underlying LLM \(for Llama\-3\.2\-3B,dT=3 072d\_\{T\}=3\\,072\)\. As hyperparameters forDEW, we usem=32m=32,n=3 072n=3\\,072, andγ=0\.5\\gamma=0\.5throughout all experiments with Llama\-3\.2\-3B\. Further, we apply whitening to token embeddings and take thetanh\\tanhof the bias scores before scaling byλ\\lambda\. As we observe no significant improvement from applying orthonormalization to the random matrices with our specific models, we omit this step from the evaluation\. We report scores for different context widthskkand watermark strengthsλ\\lambdain Table[1](https://arxiv.org/html/2606.31602#S4.T1)\. The MarkLLMPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)hyperparameters for baseline watermarking schemes are provided in Appendix[E](https://arxiv.org/html/2606.31602#A5)\. ### 4\.2Dataset and Prompt For text generation, we use the C4 datasetRaffelet al\.\([2020](https://arxiv.org/html/2606.31602#bib.bib27)\), as it is widely employed for evaluating watermarking effectiveness in high\-entropy, free\-form text generation tasks\. From each document, we take the first 30 tokens as the prompt and generate 200 additional tokens as a completion\. Since the original texts in the dataset are human\-authored, they serve as counterexamples\. ### 4\.3Detectability and Robustness Analysis Following prior work, we assess detectability at fixed FPRs of 1% and 5%\. The reported scores are based on watermark evaluations of 500 watermarked and 500 human\-authored completions\. To compute the scores, we apply a dynamic threshold that maximizes the TPR while maintaining FPRs of 1% or 5%\. This thresholding is implemented in the MarkLLM toolkitPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\. To evaluate robustness against paraphrasing and translation, we prompt GPT\-4o\-mini\-2024\-07\-18 to rewrite the watermarked text while preserving its meaning and tone\. Notably, GPT\-4o\-mini is substantially more capable than Llama\-3\.2\-3B, which we use for text generation\. The exact prompts are presented in Appendix[F](https://arxiv.org/html/2606.31602#A6)\. Table 1:True positive rates in unattacked, post\-paraphrasing, and post\-translation scenarios at false positive rates of 1 and 5 percent, evaluated on human\-authored texts\. The best scores across all watermarking schemes are highlighted inbold, while the top scores within each category \(semantic/surface\-level\) areunderlined\. The text quality measures are computed on unmodified watermarked text\. The PPL score represents the median perplexity across all texts\. Human\-authored completions have a median PPL of 10\.5 while unwatermarked generations achieve a median score of 8\.0\.UnmodifiedRobustness \(pp\)Robustness \(tr\-de\)Robustness \(tr\-fr\)Text QualityWatermark \(config\)1% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPRPPL↓\\downarrowNPS↑\\uparrowSemanticDEW\(k=3k=3,λ=1\.5\\lambda=1\.5\)0\.9920\.5380\.7940\.5960\.8860\.4120\.7609\.188−0\.104DEW\(k=3k=3,λ=2\.0\\lambda=2\.0\)0\.9980\.7460\.9160\.6500\.9060\.4980\.80610\.750−0\.226DEW\(k=5k=5,λ=1\.5\\lambda=1\.5\)0\.9880\.4100\.8180\.2520\.8180\.1160\.6469\.063−0\.110DEW\(k=5k=5,λ=2\.0\\lambda=2\.0\)0\.9980\.5740\.9120\.3680\.8700\.1440\.70210\.438−0\.238SIR0\.9760\.6740\.8660\.2800\.6240\.2280\.5509\.625−0\.216X\-SIR0\.9500\.6600\.8120\.4060\.6180\.2660\.5009\.500−0\.220ATW1\.0000\.7380\.8960\.0180\.1240\.0040\.03211\.063−0\.018TS1\.0000\.6040\.7980\.1020\.2660\.0420\.15010\.438−0\.122Surface\-levelSynthID\-D\(k=3k=3\)0\.9980\.4900\.7060\.0240\.0960\.0260\.1166\.547−0\.008SynthID\-D\(k=5k=5\)0\.9960\.1800\.3520\.0160\.0440\.0180\.0386\.375−0\.010SynthID\-ND\(k=3k=3\)0\.9980\.3820\.6140\.0160\.0660\.0240\.0866\.6250\.034SynthID\-ND\(k=5k=5\)0\.9960\.1860\.3640\.0040\.0320\.0080\.0406\.563−0\.018KGW\(k=1k=1\)1\.0000\.5660\.8720\.0220\.0880\.0120\.07010\.438−0\.130KGW\(k=3k=3\)1\.0000\.1880\.3820\.0140\.1040\.0040\.08210\.438−0\.098KGW\(k=5k=5\)0\.9980\.0680\.2140\.0180\.0600\.0180\.08010\.563−0\.136DiPmark\(k=3k=3\)0\.9940\.0900\.2860\.0120\.0540\.0140\.0408\.938−0\.028UnbiasedWM\(k=3k=3\)1\.0000\.2240\.3520\.0220\.0720\.0140\.0509\.188−0\.038 ### 4\.4Text Quality Analysis We compute theperplexity\(PPL\)Jelineket al\.\([2005](https://arxiv.org/html/2606.31602#bib.bib16)\)of a more powerful LLM to assess the quality of the watermarked text and utilize Llama\-3\.1\-8BDubeyet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib9)\)for this task\. The perplexity is defined as the exponentiated average negative log\-likelihood of the observed token sequence\. While it is widely used as a simple proxy metric for textual quality, it can also assign favorable scores to highly repetitive or overconfidently generated text, even when such outputs lack meaningful diversity or factual accuracy\. To enhance our text quality assessment, we also calculate the*Net Preference Score*\(NPS\), another proxy metric in which an*oracle*LLM directly compares watermarked and reference completions for each prompt\. In our setup, the oracle compares a watermarked candidate completion against an unwatermarked reference completion generated by the same model under identical settings\. It then judges whether the candidate is better, the reference is better, or both are of equal quality\. NPS summarizes these judgments as the overall balance between candidate wins and reference wins, with ties included in the total number of comparisons\. Positive values indicate that the oracle prefers watermarked completions, negative values indicate a preference for unwatermarked references, and values near zero suggest no clear preference\. We use GPT\-4o\-mini\-2024\-07\-18 as the oracle; the exact query is provided in Appendix[F\.3](https://arxiv.org/html/2606.31602#A6.SS3)\. ### 4\.5Evaluation In unattacked settings,DEWachieves near\-perfect detection of watermarked text at a strict 1% FPR after 200 tokens, with TPRs between 98\.8% and 99\.8% across configurations\. This places it on par with the strongest surface\-level schemes and the semantic baselinesATWandTS, both of which attain perfect detection in this setting, whileSIRandX\-SIRremain slightly lower\. Under paraphrasing,DEWremains the strongest scheme overall\. Its best configuration reaches a TPR of 74\.6% at 1% FPR and 91\.6% at 5% FPR, slightly outperformingATWand more clearly exceedingSIR,X\-SIR,TS, and the surface\-level baselines\. Regarding robustness against translation,DEW’s advantage on Llama\-3\.2\-3B is more pronounced\. For German translation, it achieves up to 65\.0% TPR at 1% FPR, compared to 40\.6% for the next\-best semantic baseline\. For French translation,DEWreaches up to 49\.8%, while the strongest semantic baseline attains 26\.6%\. AlthoughATWandTSare competitive in unattacked and paraphrased settings, their detection performance degrades substantially under translation\.TScomes with the additional downside of being prone to reverse\-engineering \(Appendix[B\.3](https://arxiv.org/html/2606.31602#A2.SS3)\)\. Finally,DEWcan be employed with lower watermark signal strength in applications prioritizing text quality over watermark robustness\. Atλ=1\.5\\lambda=1\.5,DEWachieves an NPS of up to −0\.104, indicating only a moderate oracle preference for unwatermarked completions\. By contrast,ATWachieves text quality seemingly on par with unwatermarked generations, but at the cost of substantially lower translation robustness and efficiency \(Appendix[B\.2](https://arxiv.org/html/2606.31602#A2.SS2)\)\. These results suggest thatDEWprovides a favorable trade\-off between text quality, detectability, and robustness\. Due to space constraints, we defer supplementary experiments and analyses covering ablations, computational efficiency, secrecy, robustness to additional attacks, and performance on Gemma\-7B to Appendix[B](https://arxiv.org/html/2606.31602#A2)\. ## 5Conclusion This paper presentsDEW, a watermarking algorithm with strong robustness to semantically invariant text modifications\. We evaluatedDEW’s detectability, robustness, and text quality through various experiments against a diverse range of watermarking methods\. Our results demonstrate thatDEWsubstantially improves translation robustness and achieves the strongest paraphrasing robustness in our evaluation\. Further,DEWmaintains competitive text quality, and incurs markedly lower computational overhead than other semantic watermarks, making it a practical and resilient solution for watermarking LLM\-generated text\. ## 6Limitations Our experiments cover paraphrasing, translation, lexical edits, and a count\-based watermark stealing attack, but leave other threats, such as generative attacks, for future work\. We also have not yet exhaustively tuned key design choices, including the projection dimensionalitynn,tanh\\tanhscaling factorγ\\gamma, embedding models, whitening transformations, and orthogonalization\. Moreover, while our spoofing experiments suggest thatDEW’s signal is not easily exploitable by an existing watermark stealing attack, broader secrecy analyses, including attacks targeting recovery of the secret projection matrices, remain an important direction for future work\. Future research can further improveDEW’s practicality by integrating stronger embedding models to broaden language coverage and robustness\. When the host LLM provides weak token representations, a specialized auxiliary token embedding model may also be beneficial\. Finally,DEW’s applicability to instructed dialogue systems and low\-entropy settings, including code generation, warrants further study, as do broader benchmarks and user studies assessing effects on perceived quality, factual accuracy, creativity, and relevance\. ## Ethical Considerations This research aims to provide a reliable, practical solution for distinguishing LLM\-generated text from human\-authored content\. It contributes to this broader goal by advancing watermarking methodologies, focusing on enhancing their robustness to semantic transformations while preserving text quality\. Deploying watermarks for LLM\-generated text can support provenance and accountability, but it also risks false attribution of human\-written text, overreliance in high\-stakes moderation or legal settings, uneven reliability across languages and writing styles, and adversarial escalation through evasion, removal, or spoofing attacks\. We identify no substantive risks associated with the publication of our watermarking algorithm, and our contribution is purely methodological\. The threat models we evaluate are standard in the watermarking literature and can be executed using publicly available tools\. Consequently, disclosing our method does not introduce any new adversarial capabilities beyond those already well known in existing watermarking frameworks\. ## References - S\. Aaronson and H\. Kirchner \(2022\)Watermarking GPT outputs\.Note:PowerPoint slides, presented at the University of Texas at AustinExternal Links:[Link](https://www.scottaaronson.com/talks/watermark.ppt)Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - M\. J\. Atallah, V\. Raskin, C\. Hempelmann, M\. Karahan, R\. Sion, U\. Topkara, and K\. E\. Triezenberg \(2002\)Natural language watermarking and tamperproofing\.InRevised Papers from the 5th International Workshop on Information Hiding,Ih ’02,Berlin, Heidelberg,pp\. 196–212\.External Links:ISBN 3540004211Cited by:[§2](https://arxiv.org/html/2606.31602#S2.p2.1)\. - S\. Dathathri, A\. See, S\. Ghaisas, P\. Huang, R\. McAdam, J\. Welbl, V\. Bachani, A\. Kaskasoli, R\. Stanforth, T\. Matejovicova,et al\.\(2024\)Scalable watermarking for identifying large language model outputs\.Nature634\(8035\),pp\. 818–823\.External Links:[Document](https://dx.doi.org/10.1038/s41586-024-08025-4),ISSN 0028\-0836, 1476\-4687Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - A\. Diera, L\. Galke, and A\. Scherp \(2024\)Isotropy matters: soft\-ZCA whitening of embeddings for semantic code search\.arXiv\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2411.17538),[Link](http://arxiv.org/abs/2411.17538),2411\.17538 \[cs\]Cited by:[§3\.1\.2](https://arxiv.org/html/2606.31602#S3.SS1.SSS2.p2.1)\. - A\. Dubey, A\. Jauhri, A\. Pandey, A\. Kadian, A\. Al\-Dahle, A\. Letman, A\. Mathur, A\. Schelten, A\. Yang, A\. Fan,et al\.\(2024\)The llama 3 herd of models\.CoRRabs/2407\.21783\.External Links:[Document](https://dx.doi.org/10.48550/arxiv.2407.21783),[Link](https://doi.org/10.48550/arXiv.2407.21783),2407\.21783Cited by:[§4\.1](https://arxiv.org/html/2606.31602#S4.SS1.p1.1),[§4\.4](https://arxiv.org/html/2606.31602#S4.SS4.p1.1)\. - Z\. He, B\. Zhou, H\. Hao, A\. Liu, X\. Wang, Z\. Tu, Z\. Zhang, and R\. Wang \(2024\)Can watermarks survive translation? on the cross\-lingual consistency of text watermark for large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 4115–4129\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.226),[Link](https://aclanthology.org/2024.acl-long.226/)Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p7.1),[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p5.1)\. - A\. Hou, J\. Zhang, T\. He, Y\. Wang, Y\. Chuang, H\. Wang, L\. Shen, B\. Van Durme, D\. Khashabi, and Y\. Tsvetkov \(2024a\)SemStamp: a semantic watermark with paraphrastic robustness for text generation\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 4067–4082\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.naacl-long.226),[Link](https://aclanthology.org/2024.naacl-long.226/)Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p3.1),[§1](https://arxiv.org/html/2606.31602#S1.p6.1),[§1](https://arxiv.org/html/2606.31602#S1.p7.1),[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p6.4)\. - A\. Hou, J\. Zhang, Y\. Wang, D\. Khashabi, and T\. He \(2024b\)K\-SemStamp: a clustering\-based semantic watermark for detection of machine\-generated text\.InFindings of the Association for Computational Linguistics: ACL 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 1706–1715\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.98),[Link](https://aclanthology.org/2024.findings-acl.98/)Cited by:[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p6.4)\. - Z\. Hu, L\. Chen, X\. Wu, Y\. Wu, H\. Zhang, and H\. Huang \(2024\)Unbiased watermark for large language models\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,External Links:[Link](https://openreview.net/forum?id=uWVC5FVidc)Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p2.1)\. - J\. Huang, D\. Tang, W\. Zhong, S\. Lu, L\. Shou, M\. Gong, D\. Jiang, and N\. Duan \(2021\)WhiteningBERT: an easy unsupervised sentence embedding approach\.InFindings of the Association for Computational Linguistics: EMNLP 2021,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Punta Cana, Dominican Republic,pp\. 238–244\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.findings-emnlp.23),[Link](https://aclanthology.org/2021.findings-emnlp.23/)Cited by:[§3\.1\.2](https://arxiv.org/html/2606.31602#S3.SS1.SSS2.p2.1)\. - M\. Huo, S\. A\. Somayajula, Y\. Liang, R\. Zhang, F\. Koushanfar, and P\. Xie \(2024\)Token\-specific watermarking with enhanced detectability and semantic coherence for large language models\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p2.1)\. - F\. Jelinek, R\. L\. Mercer, L\. R\. Bahl, and J\. K\. Baker \(2005\)Perplexity–a measure of the difficulty of speech recognition tasks\.The Journal of the Acoustical Society of America62\(S1\),pp\. S63–s63\.External Links:[Document](https://dx.doi.org/10.1121/1.2016299),ISSN 0001\-4966,[Link](https://doi.org/10.1121/1.2016299),https://pubs\.aip\.org/asa/jasa/article\-pdf/62/S1/S63/11558910/s63\_5\_online\.pdfCited by:[§4\.4](https://arxiv.org/html/2606.31602#S4.SS4.p1.1)\. - Y\. Jiang, G\. Rajendran, P\. Ravikumar, B\. Aragam, and V\. Veitch \(2024\)On the origins of linear representations in large language models\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§A\.1](https://arxiv.org/html/2606.31602#A1.SS1.p1.3)\. - W\. B\. Johnson and J\. Lindenstrauss \(1984\)Extensions of lipschitz mappings into a hilbert space\.Contemporary Mathematics,pp\. 189–206\.External Links:[Document](https://dx.doi.org/10.1090/conm/026/737400)Cited by:[§3\.1\.1](https://arxiv.org/html/2606.31602#S3.SS1.SSS1.p3.8)\. - N\. Jovanović, R\. Staab, and M\. Vechev \(2024\)Watermark stealing in large language models\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§B\.3](https://arxiv.org/html/2606.31602#A2.SS3.p1.1),[§1](https://arxiv.org/html/2606.31602#S1.p4.1),[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - J\. Kirchenbauer, J\. Geiping, Y\. Wen,et al\.\(2023a\)A watermark for large language models\.InProceedings of the 40th international conference on machine learning,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of machine learning research, Vol\.202,pp\. 17061–17084\.External Links:[Link](https://proceedings.mlr.press/v202/kirchenbauer23a.html)Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6),[§3\.1\.4](https://arxiv.org/html/2606.31602#S3.SS1.SSS4.p3.10)\. - J\. Kirchenbauer, J\. Geiping, Y\. Wen, M\. Shu, K\. Saifullah, K\. Kong, K\. Fernando, A\. Saha, M\. Goldblum, and T\. Goldstein \(2023b\)On the reliability of watermarks for large language models\.CoRRabs/2306\.04634\.External Links:[Document](https://dx.doi.org/10.48550/arxiv.2306.04634),[Link](https://doi.org/10.48550/arXiv.2306.04634),2306\.04634Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - K\. Krishna, Y\. Song, M\. Karpinska, J\. Wieting, and M\. Iyyer \(2023\)Paraphrasing evades detectors of ai\-generated text, but retrieval is an effective defense\.InProceedings of the 37th International Conference on Neural Information Processing Systems,Nips ’23,Red Hook, NY, USA\.Cited by:[§B\.5](https://arxiv.org/html/2606.31602#A2.SS5.p1.1),[§B\.5](https://arxiv.org/html/2606.31602#A2.SS5.p3.1),[Table 6](https://arxiv.org/html/2606.31602#A2.T6)\. - R\. Kuditipudi, J\. Thickstun, T\. Hashimoto, and P\. Liang \(2024\)Robust distortion\-free watermarks for language models\.Trans\. Mach\. Learn\. Res\.2024\.External Links:[Link](https://openreview.net/forum?id=FpaCL1MO2C)Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6),[§2](https://arxiv.org/html/2606.31602#S2.p1.1)\. - A\. Liu, L\. Pan, X\. Hu, S\. Meng, and L\. Wen \(2024\)A semantic invariant robust watermark for large language models\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,External Links:[Link](https://openreview.net/forum?id=6p8lpe4MNf)Cited by:[§1](https://arxiv.org/html/2606.31602#S1.p3.1),[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p4.1)\. - Y\. Liu and Y\. Bu \(2024\)Adaptive text watermark for large language models\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§B\.6](https://arxiv.org/html/2606.31602#A2.SS6.p6.1),[§2\.2](https://arxiv.org/html/2606.31602#S2.SS2.p3.1)\. - T\. Mesnard, C\. Hardin, R\. Dadashi, S\. Bhupatiraju, S\. Pathak, L\. Sifre, M\. Rivière, M\. S\. Kale, J\. Love, P\. Tafti,et al\.\(2024\)Gemma: open models based on gemini research and technology\.CoRRabs/2403\.08295\.External Links:[Document](https://dx.doi.org/10.48550/arxiv.2403.08295),[Link](https://doi.org/10.48550/arXiv.2403.08295),2403\.08295Cited by:[§B\.6](https://arxiv.org/html/2606.31602#A2.SS6.p1.3),[§4\.1](https://arxiv.org/html/2606.31602#S4.SS1.p1.1)\. - L\. Pan, A\. Liu, Z\. He, Z\. Gao, X\. Zhao, Y\. Lu, B\. Zhou, S\. Liu, X\. Hu, L\. Wen,et al\.\(2024\)MarkLLM: an open\-source toolkit for LLM watermarking\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,D\. I\. Hernandez Farias, T\. Hope, and M\. Li \(Eds\.\),Miami, Florida, USA,pp\. 61–71\.Note:Apache License 2\.0External Links:[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-demo.7),[Link](https://aclanthology.org/2024.emnlp-demo.7/)Cited by:[§B\.2](https://arxiv.org/html/2606.31602#A2.SS2.p2.1),[§B\.3](https://arxiv.org/html/2606.31602#A2.SS3.p1.1),[Table 8](https://arxiv.org/html/2606.31602#A5.T8),[Appendix E](https://arxiv.org/html/2606.31602#A5.p1.1),[§3\.2](https://arxiv.org/html/2606.31602#S3.SS2.p3.1),[§4\.1](https://arxiv.org/html/2606.31602#S4.SS1.p3.7),[§4\.3](https://arxiv.org/html/2606.31602#S4.SS3.p1.1)\. - G\. Papandreou and A\. L\. Yuille \(2011\)Perturb\-and\-MAP random fields: using discrete optimization to learn and sample from energy models\.In2011 International Conference on Computer Vision,pp\. 193–200\.External Links:[Document](https://dx.doi.org/10.1109/iccv.2011.6126242),[Link](https://ieeexplore.ieee.org/document/6126242)Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - C\. Raffel, N\. Shazeer, A\. Roberts, K\. Lee, S\. Narang, M\. Matena, Y\. Zhou, W\. Li, and P\. J\. Liu \(2020\)Exploring the limits of transfer learning with a unified text\-to\-text transformer\.J\. Mach\. Learn\. Res\.21\(1\)\.External Links:ISSN 1532\-4435Cited by:[§4\.2](https://arxiv.org/html/2606.31602#S4.SS2.p1.1)\. - N\. Reimers and I\. Gurevych \(2019\)Sentence\-BERT: sentence embeddings using Siamese BERT\-networks\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 3982–3992\.External Links:[Document](https://dx.doi.org/10.18653/v1/D19-1410),[Link](https://aclanthology.org/D19-1410/)Cited by:[§4\.1](https://arxiv.org/html/2606.31602#S4.SS1.p2.3)\. - R\. Tang, Y\. Chuang, and X\. Hu \(2024\)The science of detecting llm\-generated text\.Commun\. ACM67\(4\),pp\. 50–59\.External Links:[Document](https://dx.doi.org/10.1145/3624725),ISSN 0001\-0782,[Link](https://doi.org/10.1145/3624725)Cited by:[§2](https://arxiv.org/html/2606.31602#S2.p2.1)\. - U\. Topkara, M\. Topkara, and M\. J\. Atallah \(2006\)The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions\.InProceedings of the 8th Workshop on Multimedia and Security,Mm&sec ’06,New York, NY, USA,pp\. 164–174\.External Links:[Document](https://dx.doi.org/10.1145/1161366.1161397),ISBN 1595934936,[Link](https://doi.org/10.1145/1161366.1161397)Cited by:[§2](https://arxiv.org/html/2606.31602#S2.p2.1)\. - Y\. Wu, R\. Chen, Z\. Hu, Y\. Chen, J\. Guo, H\. Zhang, and H\. Huang \(2024a\)Distortion\-free watermarks are not truly distortion\-free under watermark key collisions\.CoRRabs/2406\.02603\.External Links:[Document](https://dx.doi.org/10.48550/arxiv.2406.02603),[Link](https://doi.org/10.48550/arXiv.2406.02603),2406\.02603Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6),[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p2.1)\. - Y\. Wu, Z\. Hu, J\. Guo, H\. Zhang, and H\. Huang \(2024b\)A resilient and accessible distribution\-preserving watermark for large language models\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p2.1)\. - X\. Zhao, P\. V\. Ananth, L\. Li, and Y\. Wang \(2024\)Provable robust watermarking for ai\-generated text\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,External Links:[Link](https://openreview.net/forum?id=SsmT8aO45L)Cited by:[§2\.1](https://arxiv.org/html/2606.31602#S2.SS1.p1.6)\. - Z\. Ziegler, Y\. Deng, and A\. Rush \(2019\)Neural linguistic steganography\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),pp\. 1210–1215\.External Links:[Document](https://dx.doi.org/10.18653/v1/D19-1115),[Link](https://aclanthology.org/D19-1115)Cited by:[§2](https://arxiv.org/html/2606.31602#S2.p2.1)\. ## Appendix AGeometry and Statistical Foundations of DEW Reader map\.Section[A\.1](https://arxiv.org/html/2606.31602#A1.SS1)frames next\-token prediction as the composition of a*context representation*map and a*token scoring*\(unembedding\) map, and explains whyDEWmirrors this structure via keyed signal processing on embeddings\. Section[A\.2](https://arxiv.org/html/2606.31602#A1.SS2)derives the null distributions of the core alignment score, emphasizing that the exact law is Beta\-type while a Gaussian approximation emerges in high dimensions\. Section[A\.3](https://arxiv.org/html/2606.31602#A1.SS3)states a concise one\-sided hypothesis test for watermark detection and clarifies the approximation points\. ### A\.1Inner\-product geometry of next\-token prediction Letx<tx\_\{<t\}denote the prefix andwwa candidate next token\. Decoder\-only LLMs can be abstracted as two coupled maps: a*context map*that builds a representation of the prefix, and a typically near\-linear*token scoring*\(unembedding\) map that produces next\-token logits, ht\\displaystyle h\_\{t\}=fθ\(x<t\)∈ℝd,\\displaystyle=f\_\{\\theta\}\(x\_\{<t\}\)\\in\\mathbb\{R\}^\{d\},ℓt\(w\)\\displaystyle\\ell\_\{t\}\(w\)≈⟨WU\[w\],ht⟩\+bw,\\displaystyle\\approx\\langle W\_\{U\}\[w\],\\,h\_\{t\}\\rangle\+b\_\{w\},ℙ\(w∣x<t\)\\displaystyle\\mathbb\{P\}\(w\\mid x\_\{<t\}\)=softmax\(ℓt\)\(w\)\.\\displaystyle=\\mathrm\{softmax\}\(\\ell\_\{t\}\)\(w\)\.This factorization makes clear why inner products are a natural primitive for next\-token selection\. Moreover, recent theory suggests that training pressure can make latent variables \(“concept” directions\) linearly accessible in representation spaceJianget al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib42)\)\. Thus, small perturbations expressed as controlled linear scores can interact smoothly with semantics and degrade gracefully under semantic shifts\. DEWmirrors this structure externally \(without access to the model’s internal residual stream\) using embedding models and keyed linear maps\. For each positionii, let𝐜\(i\)=\(xi−w,…,xi−1\)\\mathbf\{c\}^\{\(i\)\}=\(x\_\{i\-w\},\\ldots,x\_\{i\-1\}\)be the context window and define 𝐞C\(i\)\\displaystyle\\mathbf\{e\}\_\{C\}^\{\(i\)\}=MC\(𝐜\(i\)\)∈ℝdC,𝐞^C\(i\)=𝐞C\(i\)‖𝐞C\(i\)‖,\\displaystyle=M\_\{C\}\(\\mathbf\{c\}^\{\(i\)\}\)\\in\\mathbb\{R\}^\{d\_\{C\}\},\\qquad\\hat\{\\mathbf\{e\}\}\_\{C\}^\{\(i\)\}=\\frac\{\\mathbf\{e\}\_\{C\}^\{\(i\)\}\}\{\\\|\\mathbf\{e\}\_\{C\}^\{\(i\)\}\\\|\},𝐩C\(i\)\\displaystyle\\mathbf\{p\}\_\{C\}^\{\(i\)\}=𝐞^C\(i\)𝐑C‖𝐞^C\(i\)𝐑C‖∈ℝn,𝐩c\(i\)=\(𝐩C\(i\)\)⊤∈ℝn\.\\displaystyle=\\frac\{\\hat\{\\mathbf\{e\}\}\_\{C\}^\{\(i\)\}\\mathbf\{R\}\_\{C\}\}\{\\\|\\hat\{\\mathbf\{e\}\}\_\{C\}^\{\(i\)\}\\mathbf\{R\}\_\{C\}\\\|\}\\in\\mathbb\{R\}^\{n\},\\qquad\\mathbf\{p\}\_\{c\}^\{\(i\)\}=\(\\mathbf\{p\}\_\{C\}^\{\(i\)\}\)^\{\\top\}\\in\\mathbb\{R\}^\{n\}\.Similarly, let𝐞T\(i\)∈ℝdT\\mathbf\{e\}\_\{T\}^\{\(i\)\}\\in\\mathbb\{R\}^\{d\_\{T\}\}be the whitened token embedding ofxix\_\{i\}and𝐞^T\(i\)=𝐞T\(i\)‖𝐞T\(i\)‖\\hat\{\\mathbf\{e\}\}\_\{T\}^\{\(i\)\}=\\frac\{\\mathbf\{e\}\_\{T\}^\{\(i\)\}\}\{\\\|\\mathbf\{e\}\_\{T\}^\{\(i\)\}\\\|\}; define 𝐩T\(i\)=𝐞^T\(i\)𝐑T‖𝐞^T\(i\)𝐑T‖∈ℝn\.\\displaystyle\\mathbf\{p\}\_\{T\}^\{\(i\)\}=\\frac\{\\hat\{\\mathbf\{e\}\}\_\{T\}^\{\(i\)\}\\mathbf\{R\}\_\{T\}\}\{\\\|\\hat\{\\mathbf\{e\}\}\_\{T\}^\{\(i\)\}\\mathbf\{R\}\_\{T\}\\\|\}\\in\\mathbb\{R\}^\{n\}\.The core alignment score is the cosine similarity, expressed as the dot product Z\(i\):=𝐩T\(i\)𝐩c\(i\)∈\[−1,1\],\\displaystyle Z^\{\(i\)\}:=\\mathbf\{p\}\_\{T\}^\{\(i\)\}\\mathbf\{p\}\_\{c\}^\{\(i\)\}\\in\[\-1,1\],which is then mapped \(monotonically, e\.g\., viatanh\\tanh\) to a logit bias\. Hence,DEWcan be viewed as a keyed signal\-processing layer that injects a small, structured logit bias consistent with the inner\-product geometry underlying next\-token prediction\. ### A\.2Underlying distributions of the alignment score This section characterizes the baseline distribution of the dot productZ:=𝐩T𝐩c∈\[−1,1\]Z:=\\mathbf\{p\}\_\{T\}\\mathbf\{p\}\_\{c\}\\in\[\-1,1\]\(and its scaled versionnZ\\sqrt\{n\}\\,Z\), which underpins both watermark insertion and detection, and is used to calibrate false\-positive control in Section[A\.3](https://arxiv.org/html/2606.31602#A1.SS3)\. ##### Setup and what is \(approximately\) known\. Token embeddings are whitened in preprocessing, makingMT\(x\)M\_\{T\}\(x\)approximately isotropic for tokensx∈𝒱x\\in\\mathcal\{V\}\. After a key\-seeded random projection and normalization, it is reasonable to model𝐩T\\mathbf\{p\}\_\{T\}as approximately uniform on the unit sphere𝐒n−1\\mathbf\{S\}^\{n\-1\}\. Context embeddings𝐞C=MC\(𝐜\)\\mathbf\{e\}\_\{C\}=M\_\{C\}\(\\mathbf\{c\}\)are not generally isotropic \(they often lie in an anisotropic cone that reflects semantic constraints\), so𝐩c\\mathbf\{p\}\_\{c\}need not be uniform\. Crucially, for this baseline distribution, it suffices that, conditional on the context projection𝐩c\\mathbf\{p\}\_\{c\}, the token projection𝐩T\\mathbf\{p\}\_\{T\}is approximately uniform on𝐒n−1\\mathbf\{S\}^\{n\-1\}\. ###### Assumption A\.1\(Conditionally spherical token projection \(baseline\)\)\. In the non\-watermarked regime used for false\-positive calibration, for each positionii, conditional on the context projection𝐩c\(i\)\\mathbf\{p\}\_\{c\}^\{\(i\)\}, the normalized projected token vector satisfies𝐩T\(i\)≈𝒰\(𝐒n−1\)\\mathbf\{p\}\_\{T\}^\{\(i\)\}\\approx\\mathcal\{U\}\(\\mathbf\{S\}^\{n\-1\}\)\. ##### Why Assumption[A\.1](https://arxiv.org/html/2606.31602#A1.Thmtheorem1)is plausible under a fixed key\. Although the secret key fixes𝐑T\\mathbf\{R\}\_\{T\}for all documents, randomness remains through the token sequence underH0H\_\{0\}\. For a freshly sampled Gaussian projection and any fixed unit embedding vector, the projected vector is spherical before normalization\. In deployment, however, the key is fixed, so this key\-averaged sphericality becomes an approximation over the empirical distribution of tokens and documents\. Whitening and normalization make this approximation more plausible by reducing dominant anisotropic directions in the token embeddings, but they do not make the fixed\-key null exactly spherical\. Residual deviations from the spherical model can be handled by conservative calibration, e\.g\., usingLeffL\_\{\\mathrm\{eff\}\}or empirical null estimation\. ##### Exact law \(Beta\-type\)\. ###### Lemma A\.2\(Dot product with a spherical vector\)\. LetY∼𝒰\(𝐒n−1\)Y\\sim\\mathcal\{U\}\(\\mathbf\{S\}^\{n\-1\}\)and letx∈𝐒n−1x\\in\\mathbf\{S\}^\{n\-1\}be any fixed unit vector\. Thenx⊤Yx^\{\\top\}Yhas density f\(z\)=Γ\(n2\)πΓ\(n−12\)\(1−z2\)n−32,z∈\[−1,1\],\\displaystyle f\(z\)=\\frac\{\\Gamma\\\!\\left\(\\frac\{n\}\{2\}\\right\)\}\{\\sqrt\{\\pi\}\\,\\Gamma\\\!\\left\(\\frac\{n\-1\}\{2\}\\right\)\}\(1\-z^\{2\}\)^\{\\frac\{n\-3\}\{2\}\},\\qquad z\\in\[\-1,1\],Equivalently,1\+z2∼Beta\(n−12,n−12\)\\frac\{1\+z\}\{2\}\\sim\\mathrm\{Beta\}\\\!\\left\(\\frac\{n\-1\}\{2\},\\frac\{n\-1\}\{2\}\\right\)\. In particular,𝔼\[x⊤Y\]=0\\mathbb\{E\}\[x^\{\\top\}Y\]=0andVar\(x⊤Y\)=1n\\mathrm\{Var\}\(x^\{\\top\}Y\)=\\frac\{1\}\{n\}\. Applying the lemma conditionally withY=𝐩TY=\\mathbf\{p\}\_\{T\}andx=𝐩cx=\\mathbf\{p\}\_\{c\}\(treating𝐩c\\mathbf\{p\}\_\{c\}as fixed or slowly varying\) yields an exact description ofZZin the baseline regime as long as𝐩T\\mathbf\{p\}\_\{T\}is \(approximately\) uniform on the sphere, regardless of whether𝐩c\\mathbf\{p\}\_\{c\}is anisotropic\. ###### Lemma A\.3\(High\-dimensional Gaussian approximation\)\. Under Assumption[A\.1](https://arxiv.org/html/2606.31602#A1.Thmtheorem1), letZ:=𝐩T𝐩c∈\[−1,1\]Z:=\\mathbf\{p\}\_\{T\}\\mathbf\{p\}\_\{c\}\\in\[\-1,1\]with𝐩c∈𝐒n−1\\mathbf\{p\}\_\{c\}\\in\\mathbf\{S\}^\{n\-1\}treated as fixed \(or conditioned upon\)\. Then, asn→∞n\\to\\infty, nZ⇒ℙ𝒩\(0,1\)\.\\displaystyle\\sqrt\{n\}\\,Z\\;\\xRightarrow\{\\mathbb\{P\}\}\\;\\mathcal\{N\}\(0,1\)\. Lemma[A\.3](https://arxiv.org/html/2606.31602#A1.Thmtheorem3)motivates scaling byn\\sqrt\{n\}so that the per\-token score has approximately unit variance in the baseline regime\. ### A\.3One\-sided hypothesis test for watermark detection DEW’s detector can be interpreted as a one\-sided hypothesis test with false\-positive control\. For brevity, we first present the linear \(non\-saturated\) statistic; thetanh\\tanhnonlinearity is discussed at the end\. Letb1,…,bLb\_\{1\},\\dots,b\_\{L\}be token\-level bias scores for a document \(Algorithm[2](https://arxiv.org/html/2606.31602#alg2)\): bi=λn\(𝐩T\(i\)𝐩c\(i\)\)∈ℝ,b¯=1L∑i=1Lbi\.\\displaystyle b\_\{i\}\\;=\\;\\lambda\\sqrt\{n\}\\,\\bigl\(\\mathbf\{p\}\_\{T\}^\{\(i\)\}\\mathbf\{p\}\_\{c\}^\{\(i\)\}\\bigr\)\\in\\mathbb\{R\},\\qquad\\bar\{b\}\\;=\\;\\frac\{1\}\{L\}\\sum\_\{i=1\}^\{L\}b\_\{i\}\. ##### NullH0H\_\{0\}\(not watermarked\)\. UnderH0H\_\{0\}, token selection is not influenced by the key\. Assumption[A\.1](https://arxiv.org/html/2606.31602#A1.Thmtheorem1)formalizes the resulting spherical model for𝐩T\(i\)\\mathbf\{p\}\_\{T\}^\{\(i\)\}, and Lemma[A\.2](https://arxiv.org/html/2606.31602#A1.Thmtheorem2)yields the exact per\-token dot\-product law\. Using Section[A\.2](https://arxiv.org/html/2606.31602#A1.SS2)with𝐩T\(i\)≈𝒰\(𝐒n−1\)\\mathbf\{p\}\_\{T\}^\{\(i\)\}\\approx\\mathcal\{U\}\(\\mathbf\{S\}^\{n\-1\}\), we have \(conditionally on𝐩c\(i\)\\mathbf\{p\}\_\{c\}^\{\(i\)\}\) 𝔼\[bi\]=0,Var\(bi\)=λ2,\\displaystyle\\mathbb\{E\}\[b\_\{i\}\]=0,\\qquad\\mathrm\{Var\}\(b\_\{i\}\)=\\lambda^\{2\},and the exact single\-token distribution isλnZ\\lambda\\sqrt\{n\}\\,ZwhereZZis Beta\-type on\[−1,1\]\[\-1,1\]\. For document\-level inference, we use a CLT approximation: if\(bi\)\(b\_\{i\}\)are independent or weakly dependent with an effective sample sizeLeff≤LL\_\{\\mathrm\{eff\}\}\\leq L, then b¯≈𝒩\(0,λ2Leff\)\.\\displaystyle\\bar\{b\}\\;\\approx\\;\\mathcal\{N\}\\\!\\left\(0,\\frac\{\\lambda^\{2\}\}\{L\_\{\\mathrm\{eff\}\}\}\\right\)\.\(Practically,LeffL\_\{\\mathrm\{eff\}\}can be set toLLunder an i\.i\.d\. approximation, or conservatively reduced to account for correlations across nearby tokens\.\) ##### AlternativeH1H\_\{1\}\(watermarked withDEW\)\. UnderH1H\_\{1\},DEWbiases token probabilities toward larger alignments𝐩T\(i\)𝐩c\(i\)\\mathbf\{p\}\_\{T\}^\{\(i\)\}\\mathbf\{p\}\_\{c\}^\{\(i\)\}, inducing a positive mean shift: 𝔼\[bi\]=μ\>0,\\displaystyle\\mathbb\{E\}\[b\_\{i\}\]=\\mu\>0,and thus𝔼\[b¯\]=μ\>0\\mathbb\{E\}\[\\bar\{b\}\]=\\mu\>0while the variance remains comparable for small watermark strengths\. ##### Test statistic and rejection rule\. We testH0:μ≤0H\_\{0\}:\\mu\\leq 0againstH1:μ\>0H\_\{1\}:\\mu\>0using ZL=b¯λLeff=Leffb¯λ\.\\displaystyle Z\_\{L\}\\;=\\;\\frac\{\\bar\{b\}\}\{\\frac\{\\lambda\}\{\\sqrt\{L\_\{\\mathrm\{eff\}\}\}\}\}\\;=\\;\\frac\{\\sqrt\{L\_\{\\mathrm\{eff\}\}\}\\,\\bar\{b\}\}\{\\lambda\}\.UnderH0H\_\{0\},ZL≈𝒩\(0,1\)Z\_\{L\}\\approx\\mathcal\{N\}\(0,1\), and the one\-sided p\-value isp=1−Φ\(ZL\)p=1\-\\Phi\(Z\_\{L\}\)\. At significance levelα\\alpha, rejectH0H\_\{0\}ifZL\>zαZ\_\{L\}\>z\_\{\\alpha\}\(equivalentlyp<αp<\\alpha\), wherezαz\_\{\\alpha\}is the\(1−α\)\(1\-\\alpha\)\-quantile of the standard normal\. ##### Analytic classification threshold and empirical agreement\. The rejection ruleZL\>zαZ\_\{L\}\>z\_\{\\alpha\}is equivalent to an analytic threshold on the document score, b¯\>ταwithτα:=λLeffzα,\\displaystyle\\bar\{b\}\\;\>\\;\\tau\_\{\\alpha\}\\qquad\\text\{with\}\\qquad\\tau\_\{\\alpha\}\\;:=\\;\\frac\{\\lambda\}\{\\sqrt\{L\_\{\\mathrm\{eff\}\}\}\}\\,z\_\{\\alpha\},which yields a closed\-form decision boundary for any target false\-positive rateα\\alphaunder the Gaussian null approximation\. ## Appendix BSupplementary Experiments ### B\.1Ablation Study To isolate the roles of token\- and context\-level semantics, we evaluate four variants ofDEW: - •both: unmodifiedDEW\(baseline\)\. - •context\_only: token semantics are removed by randomly permuting the whitened token projections at initialization\. - •token\_only: context semantics are removed by replacing the context projection with a pseudo\-random unit vector seeded by the context token IDs\. - •neither: both ablations are applied simultaneously\. Table 2:True positive rates in unattacked, post\-paraphrasing, and post\-translation scenarios at false positive rates of 1 and 5 percent, evaluated on human\-authored texts\. The highest scores across all configurations are highlighted inbold\. The text quality measures are computed on unmodified watermarked text\. The PPL score represents the median perplexity across all texts\. Numbers forbothmode were copied from Table[1](https://arxiv.org/html/2606.31602#S4.T1)for easier comparison\.UnmodifiedRobustness \(pp\)Robustness \(tr\-de\)Robustness \(tr\-fr\)Text QualityEmbedding Mode1% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPRPPL↓\\downarrowNPS↑\\uparrowboth\(default\)0\.9980\.7460\.9160\.6500\.9060\.4980\.80610\.750−0\.226context\_only1\.0000\.7400\.9380\.0020\.0220\.0000\.00010\.563−0\.202token\_only1\.0000\.2240\.4240\.0000\.0160\.0000\.00010\.438−0\.084neither0\.9980\.4000\.5500\.0080\.0460\.0000\.00010\.438−0\.114 Table[2](https://arxiv.org/html/2606.31602#A2.T2)reports results fork=3k=3,λ=2\\lambda=2, and the hyperparameters from Section[4\.1](https://arxiv.org/html/2606.31602#S4.SS1)\. ##### Paraphrasing\. Removing token semantics alone \(context\_only\) leaves paraphrase robustness nearly unchanged, whereas removing context semantics \(token\_only\) causes a large drop\. The stronger performance ofneitherovertoken\_onlyshould not be interpreted as improved semantic robustness: once the context side is pseudo\-random and keyed to exact token IDs, lexical similarity no longer provides a stable alignment signal\. Instead, randomly permuting token projections inneitherlikely decorrelates the top\-mmcandidate scores and can yield a slightly larger insertion margin\. ##### Translation\. Translation largely destroys localkk\-grams, so the ablated variants lose robustness\. The full method requires both token and context semantics to transfer reliably across languages\. ### B\.2Computational Efficiency Table 3:Computational efficiency of various watermarking schemes with generation and detection times measured in seconds, computed over 500 texts with 200 tokens each\. The lowest average times across all watermarking schemes are highlighted inbold, while the lowest times within each category \(semantic/surface\-level\) areunderlined\.Generation \(sec\)Detection \(sec\)SchemeAverageMedianStd\. Dev\.AverageMedianStd\. Dev\.SemanticDEW4\.9714\.8070\.1130\.0470\.0480\.001SIR6\.8756\.8700\.1250\.2760\.2790\.023X\-SIR5\.9835\.8670\.2890\.1960\.1930\.009ATW10\.49910\.5900\.5466\.3536\.3930\.430TS3\.8113\.8010\.0610\.0950\.0960\.003Surface\-levelSynthID\-D4\.5624\.5590\.1010\.0010\.0010\.000SynthID\-ND4\.2294\.2660\.0790\.0200\.0200\.000KGW3\.7273\.7120\.0510\.0360\.0360\.000DiPmark3\.8773\.8770\.0700\.0580\.0580\.001UnbiasedWM3\.8893\.9220\.0560\.2310\.2300\.014\(no watermark\)3\.7073\.7090\.045–––Table[3](https://arxiv.org/html/2606.31602#A2.T3)presents the generation and detection runtimes for all evaluated watermarking schemes, measured under the experimental setup detailed in Section[4](https://arxiv.org/html/2606.31602#S4)and Appendix[E](https://arxiv.org/html/2606.31602#A5), withk=3k=3for all applicable methods\. All experiments were conducted on a system featuring an Intel i9\-10980XE CPU paired with a NVIDIA RTX A5000 GPU, which was used both for text generation and to accelerate detection in schemes that leverage GPU processing\. To run each scheme, we used the publicly available implementations from the MarkLLM toolkitPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\. However, these implementations are generally not optimized for runtime performance, and the reported numbers may therefore overestimate the computational overhead in a production\-grade deployment\. Notably,DEWremains one of the most efficient semantic watermarks during generation, substantially faster thanSIR,X\-SIR, and especiallyATW, thoughTSis the clear exception with runtime on par with the best\-performing surface\-level watermarks\. Furthermore,DEW’s detection is highly efficient, outperforming all semantic baselines includingTSand remaining faster thanDiPmarkandUnbiasedWMon 200\-token inputs, with only the lightest surface\-level schemes such asSynthIDandKGWdetecting faster\. ##### Compute budget\. The main experiments required roughly 25 GPU\-hours on a single NVIDIA RTX A5000, while the full set of reported local generation and detection experiments required approximately75–80GPU\-hours; this estimate excludes remote API calls used for paraphrasing, translation, and LLM\-based quality evaluation\. ### B\.3Secrecy Evaluation via Watermark Stealing We evaluate secrecy against the watermark stealing \(WS\) attack ofJovanovićet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib17)\), as implemented in MarkLLMPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\. The attack operates in a black\-box spoofing setting: the adversary observes text generated by a victim watermarked language model and estimates token\-level continuation patterns that distinguish watermarked from unwatermarked generations\. These estimates are then used to reweight the logits of an attacker\-controlled language model, producing new texts that are intended to be accepted by the victim’s detector\. We generate a stolen corpus of 2000 watermarked completions with 200 tokens each\. The attacker uses the same base language model as the victim model, so the attack is favorable to the adversary and differs only in the WS logit reweighting\. For the stealing model, we condition the estimated token biases on the three preceding tokens for all watermarks exceptTS, where we condition on only the immediately preceding token to match its context window\. ForDEW, we evaluate only thek=3,λ=2\.0k=3,\\lambda=2\.0configuration, as its short context length and high watermark strength make it the most susceptible configuration to stealing\. We evaluate the attack by comparing 200 stolen\-generated texts against 200 held\-out human\-authored texts and report the TPR of stolen texts at fixed FPRs of 1 and 5 percent\. Table 4:Spoofing success of the watermark stealing attack\. The TPR is computed on stolen\-generated texts at fixed false positive rates on held\-out human\-authored texts\. Lower values indicate stronger secrecy against this attack\.Watermark \(config\)1% FPR↓\\downarrow5% FPR↓\\downarrowDEW\(k=3k=3,λ=2\.0\\lambda=2\.0\)0\.0250\.065SIR0\.1150\.200X\-SIR0\.0250\.090ATW0\.0100\.050TS0\.9951\.000Table[4](https://arxiv.org/html/2606.31602#A2.T4)shows thatTSis almost completely vulnerable to stealing, with nearly all stolen\-generated texts detected as watermarked\.SIRalso exhibits non\-trivial spoofing success, reaching 20\.0 percent TPR at 5 percent FPR\. In contrast,DEWremains close to the nominal false\-positive levels, with TPRs of 2\.5 and 6\.5 percent at the two operating points\. This is comparable toX\-SIRand only slightly aboveATW, suggesting that the count\-based WS attack does not recover a transferableDEWsignal from the stolen corpus\. These results provide evidence thatDEW’s signal is not easily exposed as a fixed context\-token continuation bias\. This is consistent with the design ofDEW, where the watermark signal depends on continuous semantic alignment between projected context and token embeddings rather than on a fixed green\-list structure\. Nevertheless, this evaluation only rules out this particular count\-based stealing attack; stronger attacks targeting the semantic projection mechanism remain an important direction for future work\. ### B\.4Robustness to Lexical Edits Table 5:True positive rates after word deletion and synonym substitution applied to 10, 30 and 50 percent of the original watermarked words, at false positive rates of 1 and 5 percent, evaluated on human\-authored texts\. The highest scores in each column are highlighted inbold\.Word DeletionSynonym Substitution10%30%50%10%30%50%Watermark \(config\)1% FPR5% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPRDEW\(k=3k=3,λ=1\.5\\lambda=1\.5\)0\.9780\.9940\.9180\.9880\.7420\.9100\.9800\.9980\.9400\.9880\.8180\.944DEW\(k=3k=3,λ=2\.0\\lambda=2\.0\)0\.9980\.9980\.9920\.9960\.9420\.9740\.9980\.9980\.9960\.9980\.9780\.992DEW\(k=5k=5,λ=1\.5\\lambda=1\.5\)0\.7920\.9740\.7060\.9280\.5660\.8580\.7540\.9720\.5460\.9000\.3680\.796DEW\(k=5k=5,λ=2\.0\\lambda=2\.0\)0\.9620\.9960\.9040\.9880\.7940\.9500\.9480\.9960\.8260\.9840\.6620\.944SIR0\.9660\.9840\.9400\.9780\.9040\.9440\.9660\.9900\.9320\.9840\.8780\.952X\-SIR0\.9360\.9720\.9280\.9660\.9140\.9520\.9200\.9620\.8640\.9440\.8240\.914ATW0\.9961\.0000\.8600\.9620\.6380\.8421\.0001\.0000\.9740\.9980\.8340\.946TS1\.0001\.0000\.9800\.9920\.8080\.9301\.0001\.0000\.9900\.9960\.9480\.980 Table[5](https://arxiv.org/html/2606.31602#A2.T5)evaluates robustness to lightweight lexical edits\. Word deletion randomly removes whitespace\-separated words with probabilityrr, whereas context\-aware synonym substitution first selects words with WordNet entries and then replaces masked positions with the top prediction ofgoogle\-bert/bert\-large\-uncased\. Thus, the latter should be interpreted as a contextual masked\-token substitution rather than a strictly synonym\-constrained transformation\. Overall, these edits are less destructive than paraphrasing and translation, with most semantic watermarks retaining high detection rates even at larger perturbation ratios\.DEWis strongest in the more severe 30 and 50 percent settings, where thek=3,λ=2\.0k=3,\\lambda=2\.0configuration achieves the best or tied\-best TPR in nearly all columns\. This trend is particularly clear at 50 percent synonym substitution, suggesting thatDEW’s combined token\- and context\-level signal remains stable under local lexical variation\. ### B\.5Robustness to DIPPER Paraphrasing Table 6:True positive rates forDEWafter paraphrasing with the DIPPER modelKrishnaet al\.\([2023](https://arxiv.org/html/2606.31602#bib.bib20)\)for different configurations and false positive rates of 1 and 5 percent, evaluated on human\-authored texts\. The highest scores in each column are highlighted inbold\. The DIPPER hyperparameters*ld*and*od*stand for*lexical diversity*and*order diversity*, respectively\.\(ld=60, od=60\)\(ld=40, od=100\)\(ld=60, od=20\)\(ld=40, od=0\)Watermark \(config\)1% FPR5% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPRDEW\(k=3k=3,λ=1\.5\\lambda=1\.5\)0\.5840\.7440\.6960\.8360\.6940\.8220\.8860\.950DEW\(k=3k=3,λ=2\.0\\lambda=2\.0\)0\.6840\.8180\.7980\.8860\.8140\.9120\.9620\.984DEW\(k=5k=5,λ=1\.5\\lambda=1\.5\)0\.5960\.6740\.5660\.7800\.5560\.7580\.7580\.918DEW\(k=5k=5,λ=2\.0\\lambda=2\.0\)0\.5960\.7680\.7320\.8760\.7400\.8820\.8960\.960DIPPERKrishnaet al\.\([2023](https://arxiv.org/html/2606.31602#bib.bib20)\)is an 11B\-parameter paraphrase model trained to evade detectors for LLM\-generated text, including watermarking\. It conditions on the surrounding context and exposes fine\-grained controls over lexical diversity and content reordering while aiming to preserve input semantics\. Table[6](https://arxiv.org/html/2606.31602#A2.T6)reports the robustness ofDEWunder four DIPPER configurations\. Across all settings, the strongerλ=2\.0\\lambda=2\.0configurations consistently improve detection after paraphrasing, withk=3,λ=2\.0k=3,\\lambda=2\.0achieving the highest TPR in every column\. The results also show that DIPPER configurations with lower lexical diversity are less effective at removing the watermark: the least aggressive setting,\(ld=40,od=0\)\(ld=40,od=0\), preserves near\-perfect detectability, whereas the higher\-diversity setting\(ld=60,od=60\)\(ld=60,od=60\)yields the lowest TPRs\. It is important to consider that paraphrasing via DIPPER can significantly compromise text quality, particularly in high\-diversity configurationsKrishnaet al\.\([2023](https://arxiv.org/html/2606.31602#bib.bib20)\)\. Furthermore, such configurations increase the likelihood of altering the semantic meaning of the input text, which not only removes semantic watermarks but also reduces its usefulness for the attacker\. ### B\.6Robustness on Gemma\-7B Table 7:True positive rates in unattacked, post\-paraphrasing, and post\-translation scenarios at false positive rates of 1 and 5 percent, evaluated on human\-authored texts for Gemma\-7B\. The best scores across all watermarking schemes are highlighted inbold\. The text quality measures are computed on unmodified watermarked text\. The PPL score represents the median perplexity across all texts\. Rows marked with†\\daggerindicate that theDEWwatermark was inserted and detected using Llama\-3\.2\-3B token embeddings instead of the original Gemma\-7B token embeddings\.UnmodifiedRobustness \(pp\)Robustness \(tr\-de\)Robustness \(tr\-fr\)Text QualityWatermark \(config\)1% FPR1% FPR5% FPR1% FPR5% FPR1% FPR5% FPRPPL↓\\downarrowNPS↑\\uparrowDEW\(k=3k=3,λ=1\.5\\lambda=1\.5\)0\.9940\.4380\.6660\.0740\.1220\.2880\.53412\.188\-0\.022DEW\(k=3k=3,λ=2\.0\\lambda=2\.0\)1\.0000\.6540\.8300\.1080\.1960\.4280\.64013\.813\-0\.074DEW\(k=5k=5,λ=1\.5\\lambda=1\.5\)0\.9840\.4680\.7060\.0580\.1340\.2460\.49612\.188\-0\.048DEW\(k=5k=5,λ=2\.0\\lambda=2\.0\)1\.0000\.6320\.8460\.0780\.2040\.3100\.62014\.000\-0\.108SIR0\.9680\.5280\.7820\.0140\.0600\.0320\.13414\.938\-0\.172X\-SIR0\.9440\.7080\.8660\.3420\.6140\.5920\.79814\.688\-0\.120TS1\.0000\.8400\.9420\.1680\.3860\.2400\.49215\.500\-0\.074DEW†\(k=3k=3,λ=1\.5\\lambda=1\.5\)0\.9760\.1440\.3280\.3980\.7220\.1440\.37612\.953\-0\.008DEW†\(k=3k=3,λ=2\.0\\lambda=2\.0\)0\.9980\.1600\.3600\.5340\.7860\.2220\.47814\.438\-0\.082DEW†\(k=5k=5,λ=1\.5\\lambda=1\.5\)0\.9140\.0620\.3080\.0160\.1640\.1500\.54412\.5630\.004DEW†\(k=5k=5,λ=2\.0\\lambda=2\.0\)0\.9940\.0680\.2940\.0160\.1180\.1300\.54814\.250\-0\.030 Table[7](https://arxiv.org/html/2606.31602#A2.T7)repeats the main evaluation on Gemma\-7BMesnardet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib41)\), a 7\-billion\-parameter member of Google DeepMind’sGemmafamily with3 0723\\,072\-dimensional embeddings and a256 128256\\,128\-token vocabulary\. In addition to the default Gemma\-7B token embeddings, we evaluate an auxiliary variant ofDEW, marked by†\\dagger, in which the token\-side embedding space is replaced by Llama\-3\.2\-3B input embeddings\. To this end, each Gemma\-7B token is decoded, re\-tokenized with the Llama\-3\.2\-3B tokenizer, represented by the mean of the resulting input embeddings, and then passed through the standardDEWprojection pipeline\. The generation model, logits, and tokenizer remain unchanged\. Compared with the Llama\-3\.2\-3B results in Table[1](https://arxiv.org/html/2606.31602#S4.T1), the relative performance of the watermarking schemes changes noticeably\. WhileDEWremains highly detectable on unmodified Gemma\-7B completions, it no longer dominates under paraphrasing; in this setting,TSandX\-SIRachieve stronger robustness\. Translation robustness also becomes more language\-dependent\. NativeDEWis less robust than on Llama\-3\.2\-3B, whereasX\-SIRperforms particularly well after translation into French\. Using Llama\-3\.2\-3B token embeddings improvesDEW’s robustness against translation into German, but falls short in the other attack settings\. These findings indicate that LLM word embeddings are generally not equally robust to all types of attacks, and that replacing the token embedding space can shift robustness toward specific transformations rather than improving it uniformly\. The text quality results follow the same broad trend as in the main experiment: lower watermark strengths generally preserve quality better, while stronger configurations improve robustness at the cost of higher perplexity and lower NPS\. Across configurations,DEWmaintains competitive text quality on Gemma\-7B, but its robustness–quality trade\-off is less favorable than on Llama\-3\.2\-3B\. These results suggest that the token\-side embedding space is an important factor inDEW’s robustness\.DEWwould likely benefit from model\-agnostic word embeddings specifically tuned for robustness under common attacks such as paraphrasing and translation; we leave this direction for future work\. Notably, we attempted to tuneATWfor Gemma\-7B, which was not evaluated in the original study byLiu and Bu \([2024](https://arxiv.org/html/2606.31602#bib.bib2)\), but could not identify a configuration that reliably inserted a detectable watermark signal under our experimental setup; therefore, it is omitted from Table[7](https://arxiv.org/html/2606.31602#A2.T7)\. ## Appendix COrthogonal Construction of Random Matrices In the following, we propose an optional block\-wise row\-orthonormal construction of the random projection matrices𝐑T\\mathbf\{R\}\_\{T\}and𝐑C\\mathbf\{R\}\_\{C\}\. Although such orthonormality is not mandatory for the functionality ofDEW, this construction preserves inner products, norms, and therefore angles within each embedding space after projection\. We select the*projection dimensionality*nnas the least common multiple ofdTd\_\{T\}anddCd\_\{C\}to ensure the existence of integerskTk\_\{T\}andkCk\_\{C\}such that ndT=kTandndC=kC\.\\frac\{n\}\{d\_\{T\}\}=k\_\{T\}\\quad\\text\{and\}\\quad\\frac\{n\}\{d\_\{C\}\}=k\_\{C\}\.\(2\) This choice allows𝐑T\\mathbf\{R\}\_\{T\}and𝐑C\\mathbf\{R\}\_\{C\}to be constructed by concatenatingkTk\_\{T\}andkCk\_\{C\}square orthogonal blocks, respectively, followed by an appropriate scaling\. The resulting matrices satisfy 𝐑T𝐑T⊤=𝐈dTand𝐑C𝐑C⊤=𝐈dC\.\\mathbf\{R\}\_\{T\}\\mathbf\{R\}\_\{T\}^\{\\top\}=\\mathbf\{I\}\_\{d\_\{T\}\}\\quad\\text\{and\}\\quad\\mathbf\{R\}\_\{C\}\\mathbf\{R\}\_\{C\}^\{\\top\}=\\mathbf\{I\}\_\{d\_\{C\}\}\.Thus, each projection is an isometric embedding intoℝn\\mathbb\{R\}^\{n\}\. In particular, if the input embeddings are normalized before projection, then their projected representations are already normalized, so no additional post\-projection normalization is required for norm preservation\. We describe the construction for𝐑C\\mathbf\{R\}\_\{C\}; the construction for𝐑T\\mathbf\{R\}\_\{T\}is analogous\. - •Generate random matrices𝐑~j∈ℝdC×dC\\widetilde\{\\mathbf\{R\}\}\_\{j\}\\in\\mathbb\{R\}^\{d\_\{C\}\\times d\_\{C\}\}forj=1,…,kCj=1,\\ldots,k\_\{C\}\. - •Orthonormalize each matrix, for example via QR decomposition or Gram–Schmidt, to obtain orthogonal matrices𝐐j∈ℝdC×dC\\mathbf\{Q\}\_\{j\}\\in\\mathbb\{R\}^\{d\_\{C\}\\times d\_\{C\}\}satisfying 𝐐j𝐐j⊤=𝐐j⊤𝐐j=𝐈dC\.\\mathbf\{Q\}\_\{j\}\\mathbf\{Q\}\_\{j\}^\{\\top\}=\\mathbf\{Q\}\_\{j\}^\{\\top\}\\mathbf\{Q\}\_\{j\}=\\mathbf\{I\}\_\{d\_\{C\}\}\. - •Build 𝐑C≔1kC\[𝐐1∣𝐐2∣⋯∣𝐐kC\]∈ℝdC×n\.\\mathbf\{R\}\_\{C\}\\coloneq\\frac\{1\}\{\\sqrt\{k\_\{C\}\}\}\\bigl\[\\mathbf\{Q\}\_\{1\}\\mid\\mathbf\{Q\}\_\{2\}\\mid\\cdots\\mid\\mathbf\{Q\}\_\{k\_\{C\}\}\\bigr\]\\in\\mathbb\{R\}^\{d\_\{C\}\\times n\}\.Equivalently,𝐑C\\mathbf\{R\}\_\{C\}has orthonormal rows: 𝐑C𝐑C⊤=𝐈dC\.\\mathbf\{R\}\_\{C\}\\mathbf\{R\}\_\{C\}^\{\\top\}=\\mathbf\{I\}\_\{d\_\{C\}\}\. - •For any vector𝐯∈ℝ1×dC\\mathbf\{v\}\\in\\mathbb\{R\}^\{1\\times d\_\{C\}\}, the projection is 𝐯𝐑C=1kC\[𝐯𝐐1∣𝐯𝐐2∣⋯∣𝐯𝐐kC\]\.\\mathbf\{v\}\\mathbf\{R\}\_\{C\}=\\frac\{1\}\{\\sqrt\{k\_\{C\}\}\}\\bigl\[\\mathbf\{v\}\\mathbf\{Q\}\_\{1\}\\mid\\mathbf\{v\}\\mathbf\{Q\}\_\{2\}\\mid\\cdots\\mid\\mathbf\{v\}\\mathbf\{Q\}\_\{k\_\{C\}\}\\bigr\]\.Since each𝐐j\\mathbf\{Q\}\_\{j\}is orthogonal,‖𝐯𝐐j‖2=‖𝐯‖2\\\|\\mathbf\{v\}\\mathbf\{Q\}\_\{j\}\\\|\_\{2\}=\\\|\\mathbf\{v\}\\\|\_\{2\}\. Therefore, ‖𝐯𝐑C‖22=1kC∑j=1kC‖𝐯𝐐j‖22=‖𝐯‖22\.\\\|\\mathbf\{v\}\\mathbf\{R\}\_\{C\}\\\|\_\{2\}^\{2\}=\\frac\{1\}\{k\_\{C\}\}\\sum\_\{j=1\}^\{k\_\{C\}\}\\\|\\mathbf\{v\}\\mathbf\{Q\}\_\{j\}\\\|\_\{2\}^\{2\}=\\\|\\mathbf\{v\}\\\|\_\{2\}^\{2\}\. - •More generally, for any𝐮,𝐯∈ℝ1×dC\\mathbf\{u\},\\mathbf\{v\}\\in\\mathbb\{R\}^\{1\\times d\_\{C\}\}, \(𝐮𝐑C\)\(𝐯𝐑C\)⊤=1kC∑j=1kC𝐮𝐐j𝐐j⊤𝐯⊤=𝐮𝐯⊤\.\(\\mathbf\{u\}\\mathbf\{R\}\_\{C\}\)\(\\mathbf\{v\}\\mathbf\{R\}\_\{C\}\)^\{\\top\}=\\frac\{1\}\{k\_\{C\}\}\\sum\_\{j=1\}^\{k\_\{C\}\}\\mathbf\{u\}\\mathbf\{Q\}\_\{j\}\\mathbf\{Q\}\_\{j\}^\{\\top\}\\mathbf\{v\}^\{\\top\}=\\mathbf\{u\}\\mathbf\{v\}^\{\\top\}\.Thus, right multiplication by𝐑C\\mathbf\{R\}\_\{C\}is an isometric embedding fromℝdC\\mathbb\{R\}^\{d\_\{C\}\}intoℝn\\mathbb\{R\}^\{n\}, preserving norms, inner products, and angles within the context\-embedding space\. ## Appendix DDEW Detection Algorithm Algorithm 2DEWWatermark Detection \(Single Step\)0:Observed token xtx\_\{t\}, watermark context 𝐜=\(xt−k,…,xt−1\)\\mathbf\{c\}=\(x\_\{t\-k\},\\ldots,x\_\{t\-1\}\), token embedding model MTM\_\{T\}, context embedding model MCM\_\{C\}, secret key KK, top\- mmcandidate count, watermark strength λ\\lambda, saturation factor γ\\gamma, projection dimensionality nn\. 0:Token\-level watermark score ss 1:Use KKto seed a PRNG\(only once per session; can be cached\) 2:Regenerate \(or recall\) 𝐑T∈ℝdT×n\\mathbf\{R\}\_\{T\}\\in\\mathbb\{R\}^\{d\_\{T\}\\times n\}and 𝐑C∈ℝdC×n\\mathbf\{R\}\_\{C\}\\in\\mathbb\{R\}^\{d\_\{C\}\\times n\} 3:Compute projected context embedding: 4: 𝐞C←MC\(𝐜\)∈ℝdC\\mathbf\{e\}\_\{C\}\\leftarrow M\_\{C\}\(\\mathbf\{c\}\)\\in\\mathbb\{R\}^\{d\_\{C\}\} 5:Normalize 𝐞C\\mathbf\{e\}\_\{C\} 6: 𝐩C←normalize\(𝐞C𝐑C\)∈ℝn\\mathbf\{p\}\_\{C\}\\leftarrow\\operatorname\{normalize\}\(\\mathbf\{e\}\_\{C\}\\,\\mathbf\{R\}\_\{C\}\)\\in\\mathbb\{R\}^\{n\} 7:Compute \(or recall\) projected token embedding: 8: 𝐞T←MT\(xt\)∈ℝdT\\mathbf\{e\}\_\{T\}\\leftarrow M\_\{T\}\(x\_\{t\}\)\\in\\mathbb\{R\}^\{d\_\{T\}\} 9:Optional: Apply whitening to 𝐞T\\mathbf\{e\}\_\{T\} 10:Normalize rows of 𝐞T\\mathbf\{e\}\_\{T\} 11: 𝐩T←normalize\(𝐞T𝐑T\)∈ℝn\\mathbf\{p\}\_\{T\}\\leftarrow\\operatorname\{normalize\}\(\\mathbf\{e\}\_\{T\}\\,\\mathbf\{R\}\_\{T\}\)\\in\\mathbb\{R\}^\{n\} 12:Compute watermark score: 13: s←λ⋅tanh\(γn⋅𝐩T𝐩C\)∈ℝs\\leftarrow\\lambda\\cdot\\tanh\\\!\\Bigl\(\\gamma\\sqrt\{n\}\\cdot\\mathbf\{p\}\_\{T\}\\mathbf\{p\}\_\{C\}\\Bigr\)\\in\\mathbb\{R\} 14:output ss\(token\-level watermark score\) ## Appendix EWatermark Configurations Table[8](https://arxiv.org/html/2606.31602#A5.T8)provides the hyperparameters used for baseline watermarking schemes, as implemented in the MarkLLM toolkitPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\. We did not conduct an exhaustive hyperparameter search\. Baseline hyperparameters were mostly taken from the respective authors’ papers and their MarkLLM implementations, whileDEWwas tuned manually over a small range of watermark strengths and context widths\. All reported experiments are single\-seed, single\-run evaluations, so the results should be interpreted as point estimates rather than estimates of run\-to\-run variability\. We partially mitigate this limitation by evaluating each configuration on 500 watermarked and 500 human\-authored completions at fixed false\-positive rates, but repeated runs with confidence intervals would provide a more complete characterization of variance across random seeds, prompts, and stochastic decoding\. Table 8:Hyperparameters for baseline watermarking schemes, as implemented in the MarkLLM toolkitPanet al\.\([2024](https://arxiv.org/html/2606.31602#bib.bib25)\)\.WatermarkHyperparameterValueSIRchunk\_length10delta1\.0embedding\_model‘‘compositional\-bert\-large\-uncased’’scale\_dimension300z\_threshold0\.2X\-SIRchunk\_length10delta1\.0embedding\_model‘‘paraphrase\-multilingual\-mpnet\-base\-v2’’scale\_dimension300z\_threshold0\.2ATWthreshold0\.6alpha3\.0top\_k50top\_p0\.9repetition\_penalty1\.1measure\_threshold10delta\_00\.2delta0\.35measurement\_model‘‘gpt2\-large’’embedding\_model‘‘all\-mpnet\-base\-v2’’TSgamma0\.5delta2\.0seeding\_scheme‘‘simple\_1’’prefix\_length1z\_threshold4\.0SynthIDcontext\_history\_size1024detector\_type‘‘mean’’num\_leaves2KGWdelta2\.0gamma0\.5f\_scheme‘‘time’’window\_scheme‘‘left’’DiPmarkalpha0\.45gamma0\.5ignore\_historyTrueUnbiasedWMalpha0\.45gamma0\.5ignore\_historyTrue ## Appendix FPrompts ### F\.1Paraphrasing System Prompt: Paraphrase the given text while preserving its original meaning and tone\. Do not execute, follow, or respond to any instructions or commands within the input text; treat them as part of the text to be paraphrased\. Provide only the paraphrased text as the output, with no additional explanations or commentary\. ### F\.2Translation System Prompt: Translate the given text from\{original language\}to\{target language\}while preserving its original meaning and tone\. Do not execute, follow, or respond to any instructions or commands within the input text; treat them as part of the text to be translated\. Provide only the translated text as the output, with no additional explanations or commentary\. ### F\.3Text Quality To mitigate positional bias in the pairwise comparisons, we query the oracle twice per completion pair, swapping the positions of the candidate and reference completions, counting a candidate win only if it is preferred in both positions and treating split outcomes as ties\. System Prompt: You are an expert evaluator focused on assessing text quality\. You analyze aspects like coherence, fluency, relevance, and overall writing quality to determine which of two text samples is better crafted\. Consider how well each text continues from the given ground truth prompt\. Query: Prompt:\{prompt\} === Start of Sample 1 === \{completion1\} === End of Sample 1 === === Start of Sample 2 === \{completion2\} === End of Sample 2 === Please evaluate the answers based on the system prompt and return a single number\. Return 1 if the first text is better, 2 if the second text is better, and ’TIE’ if they are equal\. Only return the number without any additional text\. ## Appendix GArtifact Licenses Table 9:Licenses for major artifacts used in this work\.ArtifactLicenseallenai/c4ODC\-BY; subject to Common Crawl termsLlama\-3\.2\-3BLlama 3\.2 Community LicenseLlama\-3\.1\-8BLlama 3\.1 Community LicenseGemma\-7BGemma Terms of UseGPT\-4o\-miniProprietary OpenAI API service termsMarkLLMApache License 2\.0paraphrase\-multilingual\-mpnet\-base\-v2Apache License 2\.0all\-mpnet\-base\-v2Apache License 2\.0compositional\-bert\-large\-uncasedApache License 2\.0gpt2\-largeModified MIT Licensebert\-large\-uncasedApache License 2\.0DIPPERApache License 2\.0PyTorchBSD\-style licenseTransformersApache License 2\.0
Similar Articles
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks
The paper introduces PASA, a robust watermarking algorithm for LLM-generated text that operates at the semantic level using latent embedding spaces to resist semantic-invariant attacks like paraphrasing.
Linguistics-Aware Non-Distortionary LLM Watermarking
Introduces LUNA, a linguistics-aware LLM watermarking method that achieves non-distortionary embedding and model-free detection across multiple languages, significantly improving AUROC and perplexity preservation.
A Linguistics-Aware LLM Watermarking via Syntactic Predictability
This paper introduces STELA, a linguistics-aware watermarking framework for LLMs that leverages syntactic predictability via POS n-grams to balance text quality and detection robustness. The method enables publicly verifiable watermark detection without requiring access to model logits, demonstrating superior performance across typologically diverse languages (English, Chinese, Korean).
Dataset Watermarking for Closed LLMs with Provable Detection
This paper introduces a novel dataset watermarking method for closed LLMs that uses co-occurrence patterns of word pairs to provably detect if proprietary data was used in training, even when it constitutes a small fraction of the dataset.
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
This paper reveals a fundamental vulnerability in LLM watermarking: when users have access to multiple models, averaging their output distributions cancels watermark perturbations, enabling detection evasion. The authors propose WASH and demonstrate empirically that averaging 3-5 models suppresses detection z-scores below thresholds while improving text quality.