machine-unlearning

#machine-unlearning

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper proposes two new metrics—Knowledge Separability Score (KSS) and Knowledge Persistence Score (KPS)—to evaluate cross-linguistic information removal in multilingual machine unlearning for LLMs, addressing shortcomings of prior per-language evaluation protocols.

0 favorites 0 likes

#machine-unlearning

Inference-Time Machine Unlearning via Gated Activation Redirection

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper introduces GUARD-IT, a training-free method for machine unlearning that uses input-dependent activation steering at inference time to remove targeted knowledge from LLMs without modifying weights, matching or exceeding gradient-based baselines while preserving utility and robustness to quantization.

0 favorites 0 likes

#machine-unlearning

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Hugging Face Daily Papers ↗ · 2026-05-14 Cached

This paper identifies a fundamental sparsity-permanence tradeoff where quantization reverses machine unlearning, and proposes MANSU, a method combining causal circuit attribution and null-space projection to achieve quantization-permanent forgetting.

0 favorites 0 likes

#machine-unlearning

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces Asymmetric Langevin Unlearning (ALU), a framework that leverages public data to improve the privacy-utility trade-off in machine unlearning. It demonstrates that ALU reduces unlearning costs and enables mass unlearning while maintaining high model utility.

0 favorites 0 likes

#machine-unlearning

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Attention-Shifting (AS), a novel framework for selective machine unlearning in LLMs that balances effective removal of sensitive information while preventing hallucinations and preserving model utility. The method uses importance-aware attention suppression and retention enhancement to achieve up to 15% higher accuracy preservation compared to existing unlearning approaches on standard benchmarks.

0 favorites 0 likes

#machine-unlearning

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

arXiv cs.CL ↗ · 2026-04-20 Cached

CiPO is a novel framework for machine unlearning in Large Reasoning Models that uses iterative preference optimization with counterfactual reasoning traces to selectively remove unwanted knowledge while preserving reasoning abilities. The method addresses the challenge of unlearning in models that rely on chain-of-thought reasoning by generating logically valid alternative reasoning paths during training.

0 favorites 0 likes

#machine-unlearning

Can Large Language Models Reinvent Foundational Algorithms?

Hugging Face Daily Papers ↗ · 2026-04-07 Cached

Researchers introduce 'Unlearn-and-Reinvent', a pipeline that removes knowledge of foundational algorithms (e.g., Dijkstra's, Euclid's) from LLMs via unlearning, then tests whether models can independently reinvent them. Results show LLMs can reinvent algorithms with intuitive structures but struggle with those requiring non-obvious data structures or counterintuitive invariants.

0 favorites 0 likes

machine-unlearning

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

Inference-Time Machine Unlearning via Gated Activation Redirection

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data

Wisdom is Knowing What not to Say: Hallucination-Free LLMs Unlearning via Attention Shifting

CiPO: Counterfactual Unlearning for Large Reasoning Models through Iterative Preference Optimization

Can Large Language Models Reinvent Foundational Algorithms?

Submit Feedback