Recovering the Pre-Fine-Tuning Weights of Generative Models
Authors: Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral De Tuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (Lo RA) fine-tuned models. ... We demonstrate the effectiveness of our method by uncovering the vulnerability of real and widely used NLP and Vision models. Our approach achieves remarkable precision on an aligned Mistral model, effectively reversing the alignment training and restoring the original model (See Figure 2). Similarly, on Stable-Diffusion, we recover the original model s weights with a vanishingly small error, showcasing almost perfect reconstruction of the original generation capabilities (See Figure 3). |
| Researcher Affiliation | Academia | Eliahu Horwitz 1 Jonathan Kahana 1 Yedid Hoshen 1 1School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel. Correspondence to: Eliahu Horwitz <eliahu.horwitz@mail.huji.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Py Torch Pseudocode for Spectral De Tuning |
| Open Source Code | Yes | The code is available at https://vision.huji.ac. il/spectral_detuning/. |
| Open Datasets | Yes | Our dataset encompasses three pre-trained representative source models: a Vision Transformer (Vi T) (Dosovitskiy et al., 2020) trained on Image Net-1K (Russakovsky et al., 2015), Mistral-7B-v0.1 (Jiang et al., 2023), and Stable Diffusion 1.5 (Rombach et al., 2022). ... For each Lo RA we use a different VTAB-1k (Zhai et al., 2019) dataset, the datasets we use are: cifar100, caltech101, dtd, flower102, pet37, svhn, patch camelyon, clevr-count, clevr-distance, dmlab, kitti, dsprites-location, dsprites-orientation, smallnorb-azimuth, smallnorb-elevation. ... the SFT stage uses the Ultra Chat dataset (Ding et al., 2023) and the DPO stage uses the Ultra Feedback dataset (Cui et al., 2023). ... For evaluation we use a the first 100 captions from the COCO Captions (Chen et al., 2015) validation dataset |
| Dataset Splits | Yes | We use an 80/20 train/validation split and choose the checkpoint with the best validation loss. |
| Hardware Specification | Yes | e.g., on a cluster of desktop GPUs such as RTX2080 our method can recover the Pre-FT weights of a Mistral-7B model in under five minutes. |
| Software Dependencies | No | The paper mentions using the PEFT library and provides PyTorch pseudocode, but does not specify version numbers for these or any other software dependencies required for replication. |
| Experiment Setup | Yes | Table 8. Vi T Hyper-parameters Name Value lora rank (r) 16 lora alpha (α) 16 lr 9e 3 batch size 128 epochs 20 ... Table 9. Mistral SFT Hyper-parameters Name Value lora rank (r) 64 lora alpha (α) 64 lora dropout 0.1 lr 2e 5 batch size 4 gradient accumulation steps 128 learning rate scheduler Cosine epochs 1 warmup ratio 0.1 data type bfloat16 ... For both the Vi Ts and Stable Diffusion (SD) experiments we run Spectral De Tuning for 300 optimization steps. For the Mistral SFT and DPO experiments we use 1000 optimization steps. |