Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
LEACE: Perfect linear concept erasure in closed form
Authors: Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply LEACE to large language models with a novel procedure called concept scrubbing, which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. |
| Researcher Affiliation | Collaboration | 1Eleuther AI 2Bar-Ilan University 3ETH Zรผrich 4Booz Allen Hamilton EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Concept scrubbing |
| Open Source Code | Yes | Our code is available at https://github.com/Eleuther AI/concept-erasure. |
| Open Datasets | Yes | We use the biographies dataset of De-Arteaga et al. [6]... We collect sentences and their coarse POS tags... from the English Universal Dependencies dataset [27]... For each model family, we use a sample from the respective pretraining distribution: the validation split of the Pile [13] for the Pythia models [2], and the Red Pajama replication of the LLa MA pretraining corpus for the LLa MA family [39]. |
| Dataset Splits | No | The paper mentions using a 'validation split of the Pile' for Pythia models and 'sampling a slice of 2^22 tokens for fitting the LEACE parameters and another slice of 2^22 tokens for evaluation.' However, it does not provide specific train/validation/test percentages or absolute sample counts for its primary experiments to ensure full reproducibility of data partitioning. |
| Hardware Specification | No | The paper states: 'We are grateful to Core Weave for providing the compute resources used in Section 6.' However, it does not specify any details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions using 'the model from the Spa Cy library' and 'the BERT tokenizer' but does not provide specific version numbers for these or any other key software dependencies required for replication. |
| Experiment Setup | Yes | We fit a logistic regression profession-prediction classifier over the projected [CLS] representations... For INLP, we perform 20 iterations... We apply concept erasure to the input of each transformer block, immediately after normalization is applied (Layer Norm or RMSNorm). We use a sample from the respective pretraining distribution: the validation split of the Pile [13]... and another slice of 2^22 tokens for evaluation. |