reproducibilityindex.ai

Learning from others' mistakes: Avoiding dataset biases without modeling them

Authors: Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M Rush

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach in various settings ranging from toy datasets up to large crowd-sourced benchmarks: controlled synthetic bias setup (He et al., 2019; Clark et al., 2019), natural language inference (Mc Coy et al., 2019b), extractive question answering (Jia & Liang, 2017) and fact veriﬁcation Schuster et al. (2019).
Researcher Affiliation	Collaboration	Victor Sanh1, Thomas Wolf1, Yonatan Belinkov2 , Alexander M. Rush1 1Hugging Face, 2Technion Israel Institute of Technology
Pseudocode	No	The paper describes its methods verbally but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper states 'Our code is based on the Hugging Face Transformers library (Wolf et al., 2019)', but does not provide a link or explicit statement about releasing the source code for the methodology described in this paper.
Open Datasets	Yes	MNLI (Williams et al., 2018) is the canonical large-scale English dataset to study this problem with 433K labeled examples.
Dataset Splits	Yes	For evaluation, it features matched sets (examples from domains encountered in training) and mismatched sets (domains not-seen during training).
Hardware Specification	Yes	All of our experiments are conducted on single 16GB V100 using half-precision training for speed.
Software Dependencies	No	The paper mentions 'Our code is based on the Hugging Face Transformers library (Wolf et al., 2019)' but does not specify its version number or versions for other key software components like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We use the following hyper-parameters: 3 epochs of training with a learning rate of 3e 5, and a batch size of 32. The learning rate is linearly increased for 2000 warming steps and linearly decreased to 0 afterward. We use an Adam optimizer β = (0.9, 0.999), ϵ = 1e 8 and add a weight decay of 0.1.