In a nutshell: RepSelect isolates forget-set-specific representations through selective gradient component collapsing and achieves 4-50x greater robustness against relearning attacks than existing methods.

Researchers present RepSelect, a method that enables Large Language Models to permanently forget specific content without compromising their general capabilities. The approach demonstrates robustness against attempts to reverse the unlearning through fine-tuning.

The problem with prior unlearning methods lies in the fact that they modify representations that exist in both the retain set and the subspace that an attacker can recover through fine-tuning. This has two consequences: unlearning damages the model’s general capabilities, and the forgetting can be reversed through a few examples (few-shot prompting) or targeted retraining.

RepSelect addresses this through representation selectivity: the method isolates representations that are exclusively linked to the forget set by collapsing the top principal components of weight gradients before each update. This preserves the model’s general capabilities while leaving significantly less material available to a fine-tuning attacker.

The evaluation covers two forget categories (biohazardous knowledge and abusive tendencies) and four model families: Llama 3, Qwen 3.5, Gemma 4 E4B, and DeepSeek V2 Lite—including dense architectures and mixture-of-experts models. RepSelect is compared against five established baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL).

The results show that RepSelect achieves 4-50x greater reduction in post-relearning accuracy compared to the strongest baseline and is nearly perfectly robust against few-shot prompting attacks. This demonstrates that selective targeting of representations is a necessary step toward robust and thorough unlearning in LLMs.

Source: arxiv.org · Published June 14, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrasing and classification via Lumi News Pipeline v1.7.1.

Share on:

RepSelect: A New Approach to Robust Unlearning in Large Language Models

Lumi AI News

Legal

Topics