Large Language Model Unlearning via Embedding-Corrupted Prompts

Authors: Chris Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on unlearning, we demonstrate the superiority of our method in achieving promising unlearning at nearly zero side effects in general domains and domains closely related to the unlearned ones.
Researcher Affiliation Academia University of California, Santa Cruz {yliu298,ywan1225,jmflanig,yangliu}@ucsc.edu
Pseudocode No No structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures) were found.
Open Source Code Yes We have made our code publicly available at https://github.com/chrisliu298/llm-unlearn-eco.
Open Datasets Yes The TOFU dataset [90] is a synthetic question-answering dataset of author biographies. For both WMDP [73] and MMLU subset unlearning tasks [55], we directly unlearn on pre-trained models. We select Harry Potter and the Sorcerer s Stone [112] and BBC News articles [75] as the copyrighted content material for unlearning and unlearn models fine-tuned on the text corpus.
Dataset Splits Yes We strictly follow the original split of the forget and retain sets in the TOFU dataset [90] to train the classifiers. For all prompt classifiers, we use an independent validation set Dval to tune the decision threshold τ and hyperparameters or to calibrate the empirical quantile ˆq, which is used to determine conformity.
Hardware Specification Yes Both models are trained with a batch size of 4, accumulating gradients for 4 steps on 2 NVIDIA A6000 GPUs... We fine-tune them on the copyrighted content corpus... on two NVIDIA A100 GPUs. For all experiments conducted in the paper, we conduct experiments on a node with 8 NVIDIA A100 or NVIDIA A6000 GPUs, but at most three of each are required for a single experiment.
Software Dependencies No The paper mentions using specific models like RoBERTa and Llama-3.1-1B-Instruct, but it does not provide explicit version numbers for programming languages, libraries, or other software components (e.g., Python 3.x, PyTorch 1.x, CUDA x.x) needed to replicate the experiment environment.
Experiment Setup Yes Both models are trained with a batch size of 4, accumulating gradients for 4 steps on 2 NVIDIA A6000 GPUs, resulting in an effective batch size of 32, with a learning rate of 1e-5 for Llama-2-7B-Chat and 2e-5 for Phi-1.5. For the copyrighted content unlearning task... we fine-tune all models on the two text corpora for 5 epochs, using a batch size of 4 and a learning rate of 2e-5 on two NVIDIA A100 GPUs.