LEACE: Perfect linear concept erasure in closed form

Authors: Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply LEACE to large language models with a novel procedure called concept scrubbing, which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings.
Researcher Affiliation Collaboration 1Eleuther AI 2Bar-Ilan University 3ETH Zürich 4Booz Allen Hamilton {nora,stella}@eleuther.ai david@davidsj.com
Pseudocode Yes Algorithm 1 Concept scrubbing
Open Source Code Yes Our code is available at https://github.com/Eleuther AI/concept-erasure.
Open Datasets Yes We use the biographies dataset of De-Arteaga et al. [6]... We collect sentences and their coarse POS tags... from the English Universal Dependencies dataset [27]... For each model family, we use a sample from the respective pretraining distribution: the validation split of the Pile [13] for the Pythia models [2], and the Red Pajama replication of the LLa MA pretraining corpus for the LLa MA family [39].
Dataset Splits No The paper mentions using a 'validation split of the Pile' for Pythia models and 'sampling a slice of 2^22 tokens for fitting the LEACE parameters and another slice of 2^22 tokens for evaluation.' However, it does not provide specific train/validation/test percentages or absolute sample counts for its primary experiments to ensure full reproducibility of data partitioning.
Hardware Specification No The paper states: 'We are grateful to Core Weave for providing the compute resources used in Section 6.' However, it does not specify any details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments.
Software Dependencies No The paper mentions using 'the model from the Spa Cy library' and 'the BERT tokenizer' but does not provide specific version numbers for these or any other key software dependencies required for replication.
Experiment Setup Yes We fit a logistic regression profession-prediction classifier over the projected [CLS] representations... For INLP, we perform 20 iterations... We apply concept erasure to the input of each transformer block, immediately after normalization is applied (Layer Norm or RMSNorm). We use a sample from the respective pretraining distribution: the validation split of the Pile [13]... and another slice of 2^22 tokens for evaluation.