llm-unlearning

Tag

Cards List
#llm-unlearning

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence

arXiv cs.LG · yesterday Cached

CBD introduces an API-only black-box unlearning framework for LLMs that uses two auxiliary models to create controlled behavioral divergence between retained and target data, achieving a better unlearning-utility trade-off compared to existing methods.

0 favorites 0 likes
#llm-unlearning

RepSelect: Robust LLM Unlearning via Representation Selectivity

arXiv cs.CL · 2026-06-17 Cached

RepSelect introduces a method for robust LLM unlearning that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50× better robustness against relearning attacks compared to existing baselines across multiple model families.

0 favorites 0 likes
#llm-unlearning

Measuring the Depth of LLM Unlearning via Activation Patching

arXiv cs.CL · 2026-05-26 Cached

The paper proposes the Unlearning Depth Score (UDS), a metric that uses activation patching to quantify how thoroughly target knowledge is erased from LLMs, achieving state-of-the-art faithfulness and robustness across multiple unlearning methods.

0 favorites 0 likes
#llm-unlearning

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

arXiv cs.CL · 2026-05-13 Cached

This paper introduces Minor Component Unlearning (MCU), a novel approach to LLM unlearning that targets minor components in representations to resist relearning attacks. It addresses the vulnerability of existing methods by focusing on robust directions within the model's spectral structure.

0 favorites 0 likes
← Back to home

Submit Feedback