Recovering the Pre-Fine-Tuning Weights of Generative Models

Authors: Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral De Tuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (Lo RA) fine-tuned models. ... We demonstrate the effectiveness of our method by uncovering the vulnerability of real and widely used NLP and Vision models. Our approach achieves remarkable precision on an aligned Mistral model, effectively reversing the alignment training and restoring the original model (See Figure 2). Similarly, on Stable-Diffusion, we recover the original model s weights with a vanishingly small error, showcasing almost perfect reconstruction of the original generation capabilities (See Figure 3).
Researcher Affiliation Academia Eliahu Horwitz 1 Jonathan Kahana 1 Yedid Hoshen 1 1School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel. Correspondence to: Eliahu Horwitz <eliahu.horwitz@mail.huji.ac.il>.
Pseudocode Yes Algorithm 1 Py Torch Pseudocode for Spectral De Tuning
Open Source Code Yes The code is available at https://vision.huji.ac. il/spectral_detuning/.
Open Datasets Yes Our dataset encompasses three pre-trained representative source models: a Vision Transformer (Vi T) (Dosovitskiy et al., 2020) trained on Image Net-1K (Russakovsky et al., 2015), Mistral-7B-v0.1 (Jiang et al., 2023), and Stable Diffusion 1.5 (Rombach et al., 2022). ... For each Lo RA we use a different VTAB-1k (Zhai et al., 2019) dataset, the datasets we use are: cifar100, caltech101, dtd, flower102, pet37, svhn, patch camelyon, clevr-count, clevr-distance, dmlab, kitti, dsprites-location, dsprites-orientation, smallnorb-azimuth, smallnorb-elevation. ... the SFT stage uses the Ultra Chat dataset (Ding et al., 2023) and the DPO stage uses the Ultra Feedback dataset (Cui et al., 2023). ... For evaluation we use a the first 100 captions from the COCO Captions (Chen et al., 2015) validation dataset
Dataset Splits Yes We use an 80/20 train/validation split and choose the checkpoint with the best validation loss.
Hardware Specification Yes e.g., on a cluster of desktop GPUs such as RTX2080 our method can recover the Pre-FT weights of a Mistral-7B model in under five minutes.
Software Dependencies No The paper mentions using the PEFT library and provides PyTorch pseudocode, but does not specify version numbers for these or any other software dependencies required for replication.
Experiment Setup Yes Table 8. Vi T Hyper-parameters Name Value lora rank (r) 16 lora alpha (α) 16 lr 9e 3 batch size 128 epochs 20 ... Table 9. Mistral SFT Hyper-parameters Name Value lora rank (r) 64 lora alpha (α) 64 lora dropout 0.1 lr 2e 5 batch size 4 gradient accumulation steps 128 learning rate scheduler Cosine epochs 1 warmup ratio 0.1 data type bfloat16 ... For both the Vi Ts and Stable Diffusion (SD) experiments we run Spectral De Tuning for 300 optimization steps. For the Mistral SFT and DPO experiments we use 1000 optimization steps.