Representation Surgery: Theory and Practice of Affine Steering
Authors: Shashwat Singh, Shauli Ravfogel, Jonathan Herzig, Roee Aharoni, Ryan Cotterell, Ponnurangam Kumaraguru
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation. |
| Researcher Affiliation | Collaboration | 1IIIT Hyderabad 2Bar-Ilan University. Work done during an internship at Google Research. 3Google Research 4ETH Zurich. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/shauliravfogel/affine-steering |
| Open Datasets | Yes | we experiment on the Bios dataset (De-Arteaga et al., 2019)...we consider Blodgett et al. s (2016) dataset on various dialects of American English...Our two affine steering functions are fitted on balanced classification data that consists of full sentences with human toxicity labels, the Toxic Comments Classification Challenge data. https://www.kaggle.com/c/jigsaw-toxic-commentclassification-challenge...Wiki Text-2 (Merity et al., 2017). |
| Dataset Splits | No | The paper mentions training data and development sets (e.g., 'training section of the Bios dataset', 'development set accuracy') but does not specify explicit percentages or sample counts for training, validation, or test splits. It refers to 'a split of 10k samples from the non-toxic split of Real Toxicity Prompts' for evaluation, which serves as a test set, but lacks a full breakdown of all splits. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | Yes | We use the Python Optimal Transport (Flamary et al., 2021) implementation of the mean and covariance matching transformation...The MLP was trained in Scikit-learn (Pedregosa et al., 2011) version 1.3.2 with the default parameters. |
| Experiment Setup | Yes | To embed the biography using a single vector, we take the last-layer CLS representation for BERT and take the last-token, last-hidden-layer representations over the text for the other models. We lower the dimensionality of the Llama2 vectors to 768 using PCA. Then, we fit a logistic regression classifier...We use the same decoding sampling parameters as in Liu et al. (2021); Pozzobon et al. (2023); Gehman et al. (2020), they are listed in Table 5. Table 5: Number of Samples 25, Max length 20, temperature 1, top-p (sampling) 0.9, top-k (sampling) 0 (all). |