Tag
CBD introduces an API-only black-box unlearning framework for LLMs that uses two auxiliary models to create controlled behavioral divergence between retained and target data, achieving a better unlearning-utility trade-off compared to existing methods.
This position paper argues that the term 'machine unlearning' is overused in LLM research, advocating for stricter terminology tied to dataset-defined deletion and retraining-equivalence guarantees.
This paper argues that standard output-level evaluations of machine unlearning overestimate success, showing that methods can appear successful at the output layer while retaining structured representation-level discrepancies relative to retrained models. The authors propose retraining-consistent representation forgetting as a stronger evaluative lens.
Proposes BindingSubspace (BSU), a representation-level framework that isolates and attenuates intent-conditioned directions in end-to-end spoken language understanding models to prevent capability persistence, where suppressing an intent still allows slot generation under forced prefixes. The method reduces forced-prefix recoverability while preserving retained performance on SLU benchmarks.
This paper proposes PreUnlearn, a framework for auditing collateral knowledge damage in LLM unlearning before execution, using data-centric analysis to predict downstream damage across semantic layers.
Proposes SAGE, a post-hoc method to sanitize the final unlearning vector in LLMs, improving the retain-forget trade-off without rerunning the unlearning pipeline.
RepSelect introduces a method for robust LLM unlearning that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50× better robustness against relearning attacks compared to existing baselines across multiple model families.
This paper introduces SPACE, the first source-free unlearning framework for multimodal large language models (MLLMs), which uses text-guided proxy anchor selection and dual-constraint semantic isolation to erase target concepts without requiring access to original training data, achieving performance comparable to data-dependent methods.
The paper proposes TRACE, a method for machine unlearning in Mixture-of-Experts language models that calibrates retain regularization by reweighting token-level retain losses to address forget-retain routing mismatch. Experiments show improved forget-utility trade-off across multiple MoE LLMs.
This paper formalizes exact unlearning in reinforcement learning, proposing a ρ-TV-stable RL algorithm for tabular MDPs that efficiently removes a user's data influence at a fraction of retraining cost, achieving near-minimax-optimal regret bounds. The work is accepted at ICML and establishes both upper and lower bounds for ρ-TV-stable RL algorithms.
Introduces MASC (Margin Self-Correction), an efficient unlearning method for LLMs that uses an online stopping rule to achieve competitive forget–retain trade-offs at reduced computational cost, validated on TOFU and MUSE benchmarks.
AMNESIA is the first large-scale open-source benchmark for medical unlearning, comprising 70,560 QA pairs from 8,820 patient notes across 11 diseases, designed to evaluate forgetting of both factual and reasoning knowledge in LLMs.
The paper identifies a blind spot in machine unlearning benchmarks: underrepresentation of causal (Why-type) knowledge, and proposes 5WBench, a balanced benchmark, and Maat, a three-phase unlearning framework on LoRA adapters that achieves high forgetting and retention on causal facts.
MAAT introduces a multi-phase LoRA-adapter unlearning method along with the 5WBENCH benchmark, revealing that causal 'Why' knowledge is uniquely difficult to forget due to long multi-hop answer chains and gradient dilution, achieving strong forget–retain trade-offs on Llama 3.2-3B.
This paper proposes ManiF-SMC, a method for approximate machine unlearning that operates entirely in the representation space by pushing erased samples away from their original learned manifold representation toward their nearest semantic neighbors in the retained data, using a margin-based triplet loss guided by a self-mode-connectivity module for adaptive margins.
Introduces DualOptim+, an optimization framework for LLM unlearning that uses shared base states and decoupled delta states to balance forgetting and retaining objectives, with a quantized variant for reduced memory.
This paper revisits the reliability paradox in the context of machine unlearning for language models, demonstrating that models can achieve low calibration error while relying on shortcut-based decision rules, thereby extending the paradox to unlearned models.
Introduces HF-KCU, a method for efficient machine unlearning in federated learning that uses Krylov subspace approximations to remove a client's contribution, achieving significant speedup over retraining while preserving model accuracy and providing robustness against adversarial perturbations.
This paper introduces an interference-aware framework for multi-task machine unlearning, addressing task-level and instance-level interference through task-aware gradient projection and instance-level gradient orthogonalization, achieving effective unlearning on multi-task computer vision benchmarks.
Proposes ASRU, a controllable multimodal unlearning framework that combines activation steering with a reinforcement learning reward function to improve unlearning effectiveness and generation quality while preserving model utility on Qwen3-VL.