LEACE: Perfect linear concept erasure in closed form
Authors: Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply LEACE to large language models with a novel procedure called concept scrubbing, which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. |
| Researcher Affiliation | Collaboration | 1Eleuther AI 2Bar-Ilan University 3ETH Zürich 4Booz Allen Hamilton {nora,stella}@eleuther.ai david@davidsj.com |
| Pseudocode | Yes | Algorithm 1 Concept scrubbing |
| Open Source Code | Yes | Our code is available at https://github.com/Eleuther AI/concept-erasure. |
| Open Datasets | Yes | We use the biographies dataset of De-Arteaga et al. [6]... We collect sentences and their coarse POS tags... from the English Universal Dependencies dataset [27]... For each model family, we use a sample from the respective pretraining distribution: the validation split of the Pile [13] for the Pythia models [2], and the Red Pajama replication of the LLa MA pretraining corpus for the LLa MA family [39]. |
| Dataset Splits | No | The paper mentions using a 'validation split of the Pile' for Pythia models and 'sampling a slice of 2^22 tokens for fitting the LEACE parameters and another slice of 2^22 tokens for evaluation.' However, it does not provide specific train/validation/test percentages or absolute sample counts for its primary experiments to ensure full reproducibility of data partitioning. |
| Hardware Specification | No | The paper states: 'We are grateful to Core Weave for providing the compute resources used in Section 6.' However, it does not specify any details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions using 'the model from the Spa Cy library' and 'the BERT tokenizer' but does not provide specific version numbers for these or any other key software dependencies required for replication. |
| Experiment Setup | Yes | We fit a logistic regression profession-prediction classifier over the projected [CLS] representations... For INLP, we perform 20 iterations... We apply concept erasure to the input of each transformer block, immediately after normalization is applied (Layer Norm or RMSNorm). We use a sample from the respective pretraining distribution: the validation split of the Pile [13]... and another slice of 2^22 tokens for evaluation. |