Tag
SLAM is a novel white-box watermarking scheme that embeds marks into the structural geometry of LLM residual streams using sparse autoencoders, achieving 100% detection accuracy with minimal quality loss on Gemma-2 models, avoiding the token-distribution biasing of prior methods.
Security researcher details how Google’s SynthID invisible watermark for AI-generated images can be reversed, undermining media-provenance claims and highlighting fundamental flaws in proprietary watermarking schemes.
This paper proposes methods for protecting large language models against unauthorized knowledge distillation by rewriting reasoning traces to degrade training usefulness while preserving correctness, and embedding verifiable watermarks in distilled student models. The approach uses instruction-based and gradient-based rewriting techniques to achieve anti-distillation effects without compromising teacher model performance.
This paper introduces STELA, a linguistics-aware watermarking framework for LLMs that leverages syntactic predictability via POS n-grams to balance text quality and detection robustness. The method enables publicly verifiable watermark detection without requiring access to model logits, demonstrating superior performance across typologically diverse languages (English, Chinese, Korean).
Google DeepMind upgraded its speech synthesis model to sound more natural across 70+ languages and now applies SynthID watermarking to all outputs.
Google announced SynthID Detector, a verification portal that identifies AI-generated content across images, audio, video, and text by detecting imperceptible SynthID watermarks embedded in media created with Google's AI tools. The platform is rolling out to early testers with plans for broader availability to journalists, media professionals, and researchers.
OpenAI announces tools and research efforts to help verify content authenticity, including text watermarking, metadata approaches, and expanded image detection with C2PA metadata integration for tracking AI-generated and edited content.