Tag
This paper presents an adversarial methodology for creating and detecting AI-generated social bot content, curating a multilingual, cross-platform dataset of paired human and AI messages. Training on this adversarial data yields detection that significantly outperforms existing content-based bot detection models in real-world settings.
This paper introduces a deliberative curation protocol for multi-agent knowledge bases, addressing governance gaps such as agent statelessness and sycophancy. It evaluates the protocol via simulation, showing improved resilience under adversarial conditions.
CSULoRA is a post-hoc method for correcting trained LoRA adapters to preserve safety alignment while maintaining utility, using closest safe update estimation.
PolyGnosis is an adversarial multi-model consensus system built as a Hermes skill. It runs three AI models in parallel with different expert personas, then has a hostile critic phase, scoring via RRF and Borda Count, and a synthesis gate—all built agentically using DeepSeek V4-Pro.
This paper reveals the existence of hidden human-like spans in machine-generated texts and proposes a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of these spans.
This paper proposes an adversarial Sobolev alignment method for faithful image super resolution, aiming to reduce artifacts and improve fidelity.
The author builds two multi-agent AI systems with opposite design philosophies: ChaoticAI (collaborative, org-chart-based) and S.A.G.E. with RAAC (adversarial argumentation). The post shares reflections on memory architecture and the potential synthesis of both approaches.
NewsLens introduces a multi-agent framework designed to navigate and expose adversarial news bias, proposing a novel approach to identifying and countering biased content in news media.
ALSO introduces a framework for online strategy optimization in multi-agent social simulation, formulating multi-turn interaction as an adversarial bandit problem and using a neural surrogate for reward prediction. Experiments on the Sotopia benchmark show it outperforms static baselines and existing optimization methods.
This research paper introduces Chainwash, a multi-step rewriting attack that effectively removes statistical watermarks from diffusion language model (LLaDA-8B-Instruct) outputs, reducing detection rates from 87.9% to 4.86% after five chained rewrites.