rewriting-attack

Tag

Cards List
#rewriting-attack

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

arXiv cs.CL · 2026-05-08 Cached

This research paper introduces Chainwash, a multi-step rewriting attack that effectively removes statistical watermarks from diffusion language model (LLaDA-8B-Instruct) outputs, reducing detection rates from 87.9% to 4.86% after five chained rewrites.

0 favorites 0 likes
← Back to home

Submit Feedback