reproducibilityindex.ai

Large Language Model Unlearning

Authors: Yuanshun Yao, Xiaojun Xu, Yang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a series of empirical studies that highlight the difference between unlearning on traditional models and LLMs in Appendix C. We incorporate three key lessons. (1) We continue to unlearn after we have observed the loss on forgetting samples raises to an abnormally high level, for 3x-5x more batches. (2) To preserve normal utility, we minimize the KL divergence on predicted distribution on xnor between the original and the unlearned LLM, i.e. Eqn(6). (3) We choose Dnor to be the same format as Dfgt, e.g. to unlearn the harmful data from PKU-Safe RLHF which is in the format of Q&A, we use Truthful QA as the normal data.
Researcher Affiliation	Collaboration	Yuanshun Yao Meta Gen AI kevinyao@meta.com Xiaojun Xu Byte Dance Research xiaojun.xu@bytedance.com Yang Liu UC Santa Cruz yangliu@ucsc.edu
Pseudocode	No	The paper presents mathematical equations for its update rule and loss functions (Eqn 2-6) but does not provide a clearly labeled pseudocode block or algorithm box.
Open Source Code	Yes	We submitted our code in the supplementary file.
Open Datasets	Yes	We use harmful Q&A pairs in PKU-Safe RLHF [18] dataset as Dfgt and Truthful QA [22] dataset as Dnor. We use Harry Potter and the Sorcerer s Stone as the copyright corpus,9 HP data in short. We select the hallucinated Q&A pairs (i.e. negative samples) in the Halu Eval [21] dataset as Dfgt and Truthful QA [22] dataset as Dnor.
Dataset Splits	Yes	We split Dfgt into 70% for training, 10% for validation, and 20% for testing.
Hardware Specification	Yes	We report the run time on a single NVIDIA A100 SXM4 80 GB GPU in Figure 1.
Software Dependencies	No	The paper mentions specific LLM models (OPT-1.3B, OPT-2.7B, Llama2-7B) and datasets, but it does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, Hugging Face Transformers).
Experiment Setup	Yes	Table 8: Unlearning Harmfulness: Hyperparameter setting. Table 9: Unlearning Copyrighted Content: Hyperparameter setting. Table 10: Reducing Hallucination: Hyperparameter setting. These tables provide specific hyperparameters such as '# of unlearning batches', 'Batch Size', 'ϵ1', 'ϵ2', 'ϵ3', and 'Learning Rate'.