Feature-Level Debiased Natural Language Understanding
Authors: Yougang Lyu, Piji Li, Yechang Yang, Maarten de Rijke, Pengjie Ren, Yukun Zhao, Dawei Yin, Zhaochun Ren
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance. |
| Researcher Affiliation | Collaboration | 1School of Computer Science and Technology, Shandong University, Qingdao, China 2College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 3University of Amsterdam, Amsterdam, The Netherlands 4Baidu Inc., Beijing, China |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | 1The code is available at https://github.com/youganglyu/DCT |
| Open Datasets | Yes | MNLI The MNLI dataset (Williams, Nangia, and Bowman 2018) SNLI The SNLI dataset (Bowman et al. 2015) FEVER The FEVER dataset (Thorne et al. 2018) |
| Dataset Splits | Yes | We evaluate the in-distribution and outof-distribution performance of models on the development set and the corresponding challenge set of each dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU, memory) used for running experiments are provided. |
| Software Dependencies | No | The paper mentions software like 'BERT-base' and 'Adam W' optimizer but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For the MNLI, SNLI and FEVER datasets, we train all models for 5 epochs; all models converge. [...] we adopt the Adam W (Loshchilov and Hutter 2019) optimizer as the optimizer with initial learning rate 3e-5. Meanwhile, the temperature parameter τ, threshold λ, momentum coefficient m and scalar weighting hyperparameter α are set to 0.04, 0.6, 0.999, and 0.1. The sizes of the least similar positive samples Sp and the most similar negative samples Sdn are set to 150 and 1. |