Large Language Model Unlearning

Authors: Yuanshun Yao, Xiaojun Xu, Yang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a series of empirical studies that highlight the difference between unlearning on traditional models and LLMs in Appendix C. We incorporate three key lessons. (1) We continue to unlearn after we have observed the loss on forgetting samples raises to an abnormally high level, for 3x-5x more batches. (2) To preserve normal utility, we minimize the KL divergence on predicted distribution on xnor between the original and the unlearned LLM, i.e. Eqn(6). (3) We choose Dnor to be the same format as Dfgt, e.g. to unlearn the harmful data from PKU-Safe RLHF which is in the format of Q&A, we use Truthful QA as the normal data.
Researcher Affiliation Collaboration Yuanshun Yao Meta Gen AI kevinyao@meta.com Xiaojun Xu Byte Dance Research xiaojun.xu@bytedance.com Yang Liu UC Santa Cruz yangliu@ucsc.edu
Pseudocode No The paper presents mathematical equations for its update rule and loss functions (Eqn 2-6) but does not provide a clearly labeled pseudocode block or algorithm box.
Open Source Code Yes We submitted our code in the supplementary file.
Open Datasets Yes We use harmful Q&A pairs in PKU-Safe RLHF [18] dataset as Dfgt and Truthful QA [22] dataset as Dnor. We use Harry Potter and the Sorcerer s Stone as the copyright corpus,9 HP data in short. We select the hallucinated Q&A pairs (i.e. negative samples) in the Halu Eval [21] dataset as Dfgt and Truthful QA [22] dataset as Dnor.
Dataset Splits Yes We split Dfgt into 70% for training, 10% for validation, and 20% for testing.
Hardware Specification Yes We report the run time on a single NVIDIA A100 SXM4 80 GB GPU in Figure 1.
Software Dependencies No The paper mentions specific LLM models (OPT-1.3B, OPT-2.7B, Llama2-7B) and datasets, but it does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, Hugging Face Transformers).
Experiment Setup Yes Table 8: Unlearning Harmfulness: Hyperparameter setting. Table 9: Unlearning Copyrighted Content: Hyperparameter setting. Table 10: Reducing Hallucination: Hyperparameter setting. These tables provide specific hyperparameters such as '# of unlearning batches', 'Batch Size', 'ϵ1', 'ϵ2', 'ϵ3', and 'Learning Rate'.