reproducibilityindex.ai

Debiasing Attention Mechanism in Transformer without Demographics

Authors: Shenyu Lu, Yipei Wang, Xiaoqian Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments in computer vision and natural language processing tasks and show that our method is comparable and even outperforms the state-of-the-art method with substantially lower energy consumption. We conduct extensive experiments on real-world datasets, encompassing various classification tasks in computer vision and natural language processing (NLP) fields.
Researcher Affiliation	Academia	Shenyu Lu, Yipei Wang & Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47906, USA {lu876,wang4865,joywang}@purdue.edu
Pseudocode	Yes	We summarized our method in an algorithm, detailed in Appendix G. Algorithm 1 Debias Attention mechanism
Open Source Code	Yes	To reproduce our experiment, we have made the code available at https://github.com/lu876/Debiasing-Attention-Mechanism-in-Transformer-without-Demographics.
Open Datasets	Yes	We test all methods on two real-world datasets: Celeb A (Liu et al., 2015), and UTK (Zhang & Qi, 2017). utilizing both the Hate Xplain (Mathew et al., 2021) and Multi NLI (Williams et al., 2017) datasets.
Dataset Splits	No	The paper mentions using a 'validation set' for hyperparameter tuning and model selection, such as 'We save the model that achieves the highest validation accuracy.' (Appendix E), but does not explicitly provide the split percentages or sample counts for the training, validation, and test sets.
Hardware Specification	Yes	We train all methods on a single NVIDIA RTX-3090 GPU with 24576 Mi B memory.
Software Dependencies	No	The paper mentions using the 'Huggingface library' and 'Adam W' optimizer but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup	Yes	For Celeb A and UTK, we take the Adam W as the optimizer with a learning rate of 10 4, and no scheduler is applied for the fair comparison. For NLP tasks, we take the Adam W as the optimizer with a learning rate of 10 5. We share all methods with the same batch size and optimizer configuration. We tune the hyper-parameter η at the validation set to achieve the highest accuracy. For Celeb A and UTK experiments, we set η = 0.15 and η = 0.10 respectively. For Hate Xplain and Multi NLI, we set η = 0.25.