Large Language Model Unlearning
Authors: Yuanshun Yao, Xiaojun Xu, Yang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a series of empirical studies that highlight the difference between unlearning on traditional models and LLMs in Appendix C. We incorporate three key lessons. (1) We continue to unlearn after we have observed the loss on forgetting samples raises to an abnormally high level, for 3x-5x more batches. (2) To preserve normal utility, we minimize the KL divergence on predicted distribution on xnor between the original and the unlearned LLM, i.e. Eqn(6). (3) We choose Dnor to be the same format as Dfgt, e.g. to unlearn the harmful data from PKU-Safe RLHF which is in the format of Q&A, we use Truthful QA as the normal data. |
| Researcher Affiliation | Collaboration | Yuanshun Yao Meta Gen AI kevinyao@meta.com Xiaojun Xu Byte Dance Research xiaojun.xu@bytedance.com Yang Liu UC Santa Cruz yangliu@ucsc.edu |
| Pseudocode | No | The paper presents mathematical equations for its update rule and loss functions (Eqn 2-6) but does not provide a clearly labeled pseudocode block or algorithm box. |
| Open Source Code | Yes | We submitted our code in the supplementary file. |
| Open Datasets | Yes | We use harmful Q&A pairs in PKU-Safe RLHF [18] dataset as Dfgt and Truthful QA [22] dataset as Dnor. We use Harry Potter and the Sorcerer s Stone as the copyright corpus,9 HP data in short. We select the hallucinated Q&A pairs (i.e. negative samples) in the Halu Eval [21] dataset as Dfgt and Truthful QA [22] dataset as Dnor. |
| Dataset Splits | Yes | We split Dfgt into 70% for training, 10% for validation, and 20% for testing. |
| Hardware Specification | Yes | We report the run time on a single NVIDIA A100 SXM4 80 GB GPU in Figure 1. |
| Software Dependencies | No | The paper mentions specific LLM models (OPT-1.3B, OPT-2.7B, Llama2-7B) and datasets, but it does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, Hugging Face Transformers). |
| Experiment Setup | Yes | Table 8: Unlearning Harmfulness: Hyperparameter setting. Table 9: Unlearning Copyrighted Content: Hyperparameter setting. Table 10: Reducing Hallucination: Hyperparameter setting. These tables provide specific hyperparameters such as '# of unlearning batches', 'Batch Size', 'ϵ1', 'ϵ2', 'ϵ3', and 'Learning Rate'. |