machine-unlearning

#machine-unlearning

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence

arXiv cs.LG ↗ · yesterday Cached

CBD introduces an API-only black-box unlearning framework for LLMs that uses two auxiliary models to create controlled behavioral divergence between retained and target data, achieving a better unlearning-utility trade-off compared to existing methods.

0 favorites 0 likes

#machine-unlearning

Position: The Term "Machine Unlearning" Is Overused in LLMs

arXiv cs.CL ↗ · yesterday Cached

This position paper argues that the term 'machine unlearning' is overused in LLM research, advocating for stricter terminology tied to dataset-defined deletion and retraining-equivalence guarantees.

0 favorites 0 likes

#machine-unlearning

Erased, but Not Gone: Output Forgetting Is Not True Forgetting

arXiv cs.LG ↗ · 5d ago Cached

This paper argues that standard output-level evaluations of machine unlearning overestimate success, showing that methods can appear successful at the output layer while retaining structured representation-level discrepancies relative to retrained models. The authors propose retraining-consistent representation forgetting as a stronger evaluative lens.

0 favorites 0 likes

#machine-unlearning

Selective Capability Unlearning in End-to-End Spoken Language Understanding

arXiv cs.CL ↗ · 6d ago Cached

Proposes BindingSubspace (BSU), a representation-level framework that isolates and attenuates intent-conditioned directions in end-to-end spoken language understanding models to prevent capability persistence, where suppressing an intent still allows slot generation under forced prefixes. The method reduces forced-prefix recoverability while preserving retained performance on SLU benchmarks.

0 favorites 0 likes

#machine-unlearning

PreUnlearn: Auditing Collateral Knowledge Damage Before Large Language Model Unlearning

arXiv cs.CL ↗ · 2026-06-18 Cached

This paper proposes PreUnlearn, a framework for auditing collateral knowledge damage in LLM unlearning before execution, using data-centric analysis to predict downstream damage across semantic layers.

0 favorites 0 likes

#machine-unlearning

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

arXiv cs.LG ↗ · 2026-06-18 Cached

Proposes SAGE, a post-hoc method to sanitize the final unlearning vector in LLMs, improving the retain-forget trade-off without rerunning the unlearning pipeline.

0 favorites 0 likes

#machine-unlearning

RepSelect: Robust LLM Unlearning via Representation Selectivity

arXiv cs.CL ↗ · 2026-06-17 Cached

RepSelect introduces a method for robust LLM unlearning that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50× better robustness against relearning attacks compared to existing baselines across multiple model families.

0 favorites 0 likes

#machine-unlearning

SPACE: Source-free Proxy Anchor Concept Erasure for MLLMs

arXiv cs.LG ↗ · 2026-06-10 Cached

This paper introduces SPACE, the first source-free unlearning framework for multimodal large language models (MLLMs), which uses text-guided proxy anchor selection and dual-constraint semantic isolation to erase target concepts without requiring access to original training data, achieving performance comparable to data-dependent methods.

0 favorites 0 likes

#machine-unlearning

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

arXiv cs.CL ↗ · 2026-06-10 Cached

The paper proposes TRACE, a method for machine unlearning in Mixture-of-Experts language models that calibrates retain regularization by reweighting token-level retain losses to address forget-retain routing mismatch. Experiments show improved forget-utility trade-off across multiple MoE LLMs.

0 favorites 0 likes

#machine-unlearning

Exact Unlearning in Reinforcement Learning

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.

0 favorites 0 likes

#machine-unlearning

Fast Unlearning at Scale via Margin Self-Correction

arXiv cs.LG ↗ · 2026-06-03 Cached

Introduces MASC (Margin Self-Correction), an efficient unlearning method for LLMs that uses an online stopping rule to achieve competitive forget–retain trade-offs at reduced computational cost, validated on TOFU and MUSE benchmarks.

0 favorites 0 likes

#machine-unlearning

AMNESIA: A Large Scale Medical Unlearning Benchmark Suite with Disease-Informed Analysis

arXiv cs.LG ↗ · 2026-06-01 Cached

AMNESIA is the first large-scale open-source benchmark for medical unlearning, comprising 70,560 QA pairs from 8,820 patient notes across 11 diseases, designed to evaluate forgetting of both factual and reasoning knowledge in LLMs.

0 favorites 0 likes

#machine-unlearning

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

arXiv cs.LG ↗ · 2026-06-01 Cached

The paper identifies a blind spot in machine unlearning benchmarks: underrepresentation of causal (Why-type) knowledge, and proposes 5WBench, a balanced benchmark, and Maat, a three-phase unlearning framework on LoRA adapters that achieves high forgetting and retention on causal facts.

0 favorites 0 likes

#machine-unlearning

MAAT: Multi-phase Adapter-Aware Targeted Unlearning

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

MAAT introduces a multi-phase LoRA-adapter unlearning method along with the 5WBENCH benchmark, revealing that causal 'Why' knowledge is uniquely difficult to forget due to long multi-hop answer chains and gradient dilution, achieving strong forget–retain trade-offs on Llama 3.2-3B.

0 favorites 0 likes

#machine-unlearning

Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper proposes ManiF-SMC, a method for approximate machine unlearning that operates entirely in the representation space by pushing erased samples away from their original learned manifold representation toward their nearest semantic neighbors in the retained data, using a margin-based triplet loss guided by a self-mode-connectivity module for adaptive margins.

0 favorites 0 likes

#machine-unlearning

DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models

arXiv cs.LG ↗ · 2026-05-22 Cached

Introduces DualOptim+, an optimization framework for LLM unlearning that uses shared base states and decoupled delta states to balance forgetting and retaining objectives, with a quantized variant for reduced memory.

0 favorites 0 likes

#machine-unlearning

Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper revisits the reliability paradox in the context of machine unlearning for language models, demonstrating that models can achieve low calibration error while relying on shortcut-based decision rules, thereby extending the paradox to unlearned models.

0 favorites 0 likes

#machine-unlearning

Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions

arXiv cs.LG ↗ · 2026-05-21 Cached

Introduces HF-KCU, a method for efficient machine unlearning in federated learning that uses Krylov subspace approximations to remove a client's contribution, achieving significant speedup over retraining while preserving model accuracy and providing robustness against adversarial perturbations.

0 favorites 0 likes

#machine-unlearning

Interference-Aware Multi-Task Unlearning

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper introduces an interference-aware framework for multi-task machine unlearning, addressing task-level and instance-level interference through task-aware gradient projection and instance-level gradient orthogonalization, achieving effective unlearning on multi-task computer vision benchmarks.

0 favorites 0 likes

#machine-unlearning

ASRU: Activation Steering Meets Reinforcement Unlearning for Multimodal Large Language Models

arXiv cs.CL ↗ · 2026-05-18 Cached

Proposes ASRU, a controllable multimodal unlearning framework that combines activation steering with a reinforcement learning reward function to improve unlearning effectiveness and generation quality while preserving model utility on Qwen3-VL.

0 favorites 0 likes

machine-unlearning

Submit Feedback