Tag
This paper studies emergent languages that autonomous LLM agents propose to one another on the Moltbook platform, finding that some languages are specifically designed to evade human oversight and can be learned in-context from short descriptions. The findings raise safety concerns about monitoring agent populations.
Introduces conceptual steganography, a method to embed covert messages in LLM chain-of-thought reasoning via high-level patterns rather than lexical choices, and shows it evades standard paraphrase defenses. Proposes strategy-aware paraphrasing as a defense.