Linear Adversarial Concept Erasure
Authors: Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan D Cotterell
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When evaluated in the context of binary gender removal, our method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method despite being linear is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Bar Ilan University 2Allen Institue for Artificial Intelligence 3Independent researcher 4ETH Z urich. |
| Pseudocode | Yes | Algorithm 1 Relaxed Linear Adversarial Concept Erasure (R-LACE) |
| Open Source Code | Yes | https://github.com/shauli-ravfogel/ rlace-icml |
| Open Datasets | Yes | Our bias mitigation target is the uncased version of the Glo Ve word vectors (Pennington et al., 2014), and we use the training and test data of Ravfogel et al. (2020)... |
| Dataset Splits | Yes | We use the same train dev test split of Ravfogel et al. (2020), but discard the gender-neutral words (i.e., we cast the problem as a binary classification). We end up with a training set, evaluation set and test set of sizes 7,350, 3,150 and 4,500, respectively. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions using "Sklearn" and "Hugging Face implementation" but does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We run Alg. 1 for 50,000 iterations with the cross entropy loss, alternating between an update to the adversary and to the classifier after each iteration (T = 50, 000, M = 1 in Alg. 1)... We train with a simple SGD, with a learning rate of 0.005, chosen by experimenting with the development set. We use a batch size of 128. |