Feature-Level Debiased Natural Language Understanding

Authors: Yougang Lyu, Piji Li, Yechang Yang, Maarten de Rijke, Pengjie Ren, Yukun Zhao, Dawei Yin, Zhaochun Ren

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
Researcher Affiliation Collaboration 1School of Computer Science and Technology, Shandong University, Qingdao, China 2College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 3University of Amsterdam, Amsterdam, The Netherlands 4Baidu Inc., Beijing, China
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes 1The code is available at https://github.com/youganglyu/DCT
Open Datasets Yes MNLI The MNLI dataset (Williams, Nangia, and Bowman 2018) SNLI The SNLI dataset (Bowman et al. 2015) FEVER The FEVER dataset (Thorne et al. 2018)
Dataset Splits Yes We evaluate the in-distribution and outof-distribution performance of models on the development set and the corresponding challenge set of each dataset.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU, memory) used for running experiments are provided.
Software Dependencies No The paper mentions software like 'BERT-base' and 'Adam W' optimizer but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow.
Experiment Setup Yes For the MNLI, SNLI and FEVER datasets, we train all models for 5 epochs; all models converge. [...] we adopt the Adam W (Loshchilov and Hutter 2019) optimizer as the optimizer with initial learning rate 3e-5. Meanwhile, the temperature parameter τ, threshold λ, momentum coefficient m and scalar weighting hyperparameter α are set to 0.04, 0.6, 0.999, and 0.1. The sizes of the least similar positive samples Sp and the most similar negative samples Sdn are set to 150 and 1.