Rethinking LLM Memorization through the Lens of Adversarial Compression

Authors: Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary Lipton, J. Zico Kolter

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the practical value of our definition and algorithm through several case studies as well as that the definition meets our expectations around memorization with validation experiments.
Researcher Affiliation Academia Avi Schwarzschild schwarzschild@cmu.edu Carnegie Mellon University Zhili Feng zhilif@andrew.cmu.edu Carnegie Mellon University Pratyush Maini pratyushmaini@cmu.edu Carnegie Mellon University Zachary C. Lipton Carnegie Mellon University J. Zico Kolter Carnegie Mellon University
Pseudocode Yes Algorithm 1 MINIPROMPT... Algorithm 2 Greedy Coordinate Gradient (GCG)... Algorithm 3 Random Search
Open Source Code Yes Project page: https://locuslab.github.io/acr-memorization
Open Datasets Yes We compare completion and compression tests on the TOFU dataset [Maini et al., 2024]. This dataset is released under the MIT License and their assets and license can be found at https://huggingface.co/datasets/locuslab/TOFU.
Dataset Splits No The paper mentions using 4,000 QA samples for finetuning and unlearning, but does not explicitly detail train/validation/test splits with percentages or sample counts.
Hardware Specification Yes with a single NVIDIA RTX A4000 GPU we can compute minimal prompt in a few minutes if it is highly compressible and a few hours (around 10 in the worst case) if we need to search for very long prompts. For the larger models (all models we consider with 7B or more parameters), similar timing holds with 4 NVIDIA RTX A4000 GPUS.
Software Dependencies No The paper mentions specific models like Pythia and Llama-2-7B-chat, and algorithms like GCG, but does not provide details on general software dependencies (e.g., Python, PyTorch, or other libraries) with specific version numbers.
Experiment Setup Yes Following the TOFU framework [Maini et al., 2024], we finetune with a learning rate of 2 10 5 and reduce the learning rate during unlearning to 1 10 5. Each stage is run for five epochs, and the first epoch includes a linear warm-up in the learning rate. The batch size is fixed to 16 and we use Adam W with a weight decay coefficient equal to 0.01.