adversarial-robustness

#adversarial-robustness

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

arXiv cs.LG ↗ · 6d ago Cached

Proposes Latent-Predictive Counterfactual Decoupling (LPCD) to address tactical out-of-distribution shifts in live streaming risk assessment by decoupling stable malicious intent from evolving narrative tactics at the latent level, achieving superior performance on large-scale industrial datasets.

0 favorites 0 likes

#adversarial-robustness

RRISE: Robust Radius Inference via a Surrogate Estimator

arXiv cs.LG ↗ · 6d ago Cached

RRISE introduces a learned surrogate estimator that reduces the Monte Carlo sampling cost of randomized smoothing for certified robustness to a single forward pass, maintaining accuracy within 0.84 percentage points while replacing up to 10^4 evaluations per query.

0 favorites 0 likes

#adversarial-robustness

Making Brain-Computer Interfaces More Secure

arXiv cs.LG ↗ · 6d ago Cached

This paper proposes a lightweight CNN architecture to improve adversarial robustness in EEG-based brain-computer interfaces, evaluating it against adversarial attacks and showing better classification performance than existing models.

0 favorites 0 likes

#adversarial-robustness

TASER: Task-Aware Stein Regularisation for Geometry-Driven Robustness

arXiv cs.LG ↗ · 2026-06-01 Cached

Introduces TASER, a training-time regularization framework derived from Langevin Stein operators that encourages geometric compatibility between predictors and data density, improving adversarial robustness and stability on CIFAR-10 without significant clean accuracy degradation.

0 favorites 0 likes

#adversarial-robustness

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

arXiv cs.AI ↗ · 2026-06-01 Cached

Introduces PReMISE, a framework for discovering and auditing policy-level rubrics for LLM judges along four axes: structural adequacy, reliability, preference fit, and adversarial robustness.

0 favorites 0 likes

#adversarial-robustness

The Distillation Game: Adaptive Attacks & Efficient Defenses

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

This paper studies distillation attacks where model outputs can enable imitation, proposing a minimax game framework and a forward-pass-only defense called Product-of-Experts, showing that adaptive students recover more capability than passive evaluation suggests.

0 favorites 0 likes

#adversarial-robustness

The Hamilton-Jacobi Theory of Deep Learning

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper identifies neural network training as a search through Hamilton-Jacobi initial-value problems, showing that residual networks, transformers, and RNNs discretize the same class of viscous Hamilton-Jacobi equations. It derives quantitative consequences including minimax optimal generalization rates, adversarial robustness bounds, and a closed-form influence function.

0 favorites 0 likes

#adversarial-robustness

Provable Robustness against Backdoor Attacks via the Primal-Dual Perspective on Differential Privacy

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper introduces a framework that connects randomized smoothing to differential privacy through privacy profiles, enabling tight provable robustness guarantees against backdoor attacks that jointly affect training and inference. The approach is instantiated for DP-SGD and Deep Partition Aggregation with experiments on MNIST and CIFAR-10.

0 favorites 0 likes

#adversarial-robustness

Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

arXiv cs.LG ↗ · 2026-05-21 Cached

Introduces HF-KCU, a method for efficient machine unlearning in federated learning that uses Krylov subspace approximations to remove a client's contribution, achieving significant speedup over retraining while preserving model accuracy and providing robustness against adversarial perturbations.

0 favorites 0 likes

#adversarial-robustness

more ai slop to slop around~

Reddit r/singularity ↗ · 2026-05-17

This post extends E8 lattice geometric activation injection to supervised LLM safety routing, using STE-snapped E8 policy heads. While achieving near-perfect routing on clean data, the approach catastrophically fails under adversarial stress, requiring a hybrid symbolic-geometric architecture with audited deterministic rules.

0 favorites 0 likes

#adversarial-robustness

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper introduces Context-Driven Decomposition (CDD), a probe to diagnose when RAG systems comply with retrieved context despite conflicting parametric knowledge, and releases the Epi-Scale benchmark for systematic study across model families.

0 favorites 0 likes

#adversarial-robustness

Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms

arXiv cs.AI ↗ · 2026-05-12 Cached

This paper introduces Latent Personality Alignment (LPA), a method that improves LLM safety by training on abstract personality traits rather than explicit harmful examples. The approach achieves better generalization against adversarial attacks and preserves model utility with significantly fewer training samples.

0 favorites 0 likes

#adversarial-robustness

GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper introduces GAMBIT, a benchmark for evaluating adversarial robustness in multi-agent LLM collectives, featuring adaptive imposters and recalibration modes to address the limitations of existing shallow evaluations.

0 favorites 0 likes

#adversarial-robustness

Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces the Adversarial Empathy Benchmark (AEB) and Emotional Consistency Score (ECS) to test the robustness of RLVER-trained models against adversarial user behaviors. Results show that while RLVER improves emotional responsiveness, it does not significantly enhance the model's ability to track user emotional states under adversarial conditions.

0 favorites 0 likes

#adversarial-robustness

Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper investigates adversarial robustness in Fuzzy ARTMAP, a streaming neural architecture, by introducing WB-Softmax as a mechanism-aligned white-box attack surrogate. It evaluates progressive training and selective updating strategies to improve robustness without data replay, while also offering interpretable diagnostics for structural failures.

0 favorites 0 likes

#adversarial-robustness

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

arXiv cs.CL ↗ · 2026-05-11 Cached

This paper introduces MELD, a detector for AI-generated text that uses multi-task learning with auxiliary heads for generator family, attack type, and source domain to improve robustness. MELD achieves strong performance on the RAID benchmark and maintains low false-positive rates under adversarial attacks.

0 favorites 0 likes

#adversarial-robustness

Adversarial Graph Neural Network Benchmarks: Towards Practical and Fair Evaluation

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper presents a comprehensive benchmark for evaluating adversarial attacks and defenses in Graph Neural Networks, highlighting the need for standardized and fair experimental protocols.

0 favorites 0 likes

#adversarial-robustness

Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

arXiv cs.CL ↗ · 2026-05-08 Cached

Proposes LiSCP, a lightweight stylistic consistency profiling method for robust detection of LLM-generated textual content, focusing on feature stability under adversarial manipulation. Achieves superior performance on in-domain and cross-domain detection with notable robustness.

0 favorites 0 likes

#adversarial-robustness

Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces a resource-efficient pruning framework that identifies and removes parameters associated with unsafe behaviors in large language models while preserving utility. Using gradient-free attribution and the Lottery Ticket Hypothesis perspective, the method achieves significant reductions in unsafe generations and improved robustness against jailbreak attacks with minimal performance loss.

0 favorites 0 likes

#adversarial-robustness

Stochasticity in Tokenization Improves Robustness

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper demonstrates that training large language models with stochastic tokenization instead of deterministic canonical tokenization significantly improves robustness to adversarial attacks and random perturbations, with improvements shown across pre-training, fine-tuning, and in-context learning without increasing inference costs.

0 favorites 0 likes

adversarial-robustness

Submit Feedback