Large Language Model Unlearning via Embedding-Corrupted Prompts
Authors: Chris Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on unlearning, we demonstrate the superiority of our method in achieving promising unlearning at nearly zero side effects in general domains and domains closely related to the unlearned ones. |
| Researcher Affiliation | Academia | University of California, Santa Cruz {yliu298,ywan1225,jmflanig,yangliu}@ucsc.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures) were found. |
| Open Source Code | Yes | We have made our code publicly available at https://github.com/chrisliu298/llm-unlearn-eco. |
| Open Datasets | Yes | The TOFU dataset [90] is a synthetic question-answering dataset of author biographies. For both WMDP [73] and MMLU subset unlearning tasks [55], we directly unlearn on pre-trained models. We select Harry Potter and the Sorcerer s Stone [112] and BBC News articles [75] as the copyrighted content material for unlearning and unlearn models fine-tuned on the text corpus. |
| Dataset Splits | Yes | We strictly follow the original split of the forget and retain sets in the TOFU dataset [90] to train the classifiers. For all prompt classifiers, we use an independent validation set Dval to tune the decision threshold τ and hyperparameters or to calibrate the empirical quantile ˆq, which is used to determine conformity. |
| Hardware Specification | Yes | Both models are trained with a batch size of 4, accumulating gradients for 4 steps on 2 NVIDIA A6000 GPUs... We fine-tune them on the copyrighted content corpus... on two NVIDIA A100 GPUs. For all experiments conducted in the paper, we conduct experiments on a node with 8 NVIDIA A100 or NVIDIA A6000 GPUs, but at most three of each are required for a single experiment. |
| Software Dependencies | No | The paper mentions using specific models like RoBERTa and Llama-3.1-1B-Instruct, but it does not provide explicit version numbers for programming languages, libraries, or other software components (e.g., Python 3.x, PyTorch 1.x, CUDA x.x) needed to replicate the experiment environment. |
| Experiment Setup | Yes | Both models are trained with a batch size of 4, accumulating gradients for 4 steps on 2 NVIDIA A6000 GPUs, resulting in an effective batch size of 32, with a learning rate of 1e-5 for Llama-2-7B-Chat and 2e-5 for Phi-1.5. For the copyrighted content unlearning task... we fine-tune all models on the two text corpora for 5 epochs, using a batch size of 4 and a learning rate of 2e-5 on two NVIDIA A100 GPUs. |