reproducibilityindex.ai

AUXILIARY TASK UPDATE DECOMPOSITION: THE GOOD, THE BAD AND THE NEUTRAL

Authors: Lucio M. Dery, Yann Dauphin, David Grangier

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare ATTITTUD with previous methods on a variety of tasks and domains. We rely on both text and image classiﬁcation tasks to conduct our analysis. We also present ablation experiments to explain the impact of hyper-parameter selection. We make code for ATTITTUD and related experiments available on github. 1
Researcher Affiliation	Collaboration	Lucio M. Dery Department of Computer Science Carnegie Mellon University Pittsburgh, PA, USA Yann Dauphin Google Research David Grangier Google Research
Pseudocode	Yes	Algorithm 1: ATTITTUD : Construct Auxiliary Task Surrogate Gradient
Open Source Code	Yes	We make code for ATTITTUD and related experiments available on github. 1Code available here https://github.com/ldery/ATTITTUD
Open Datasets	Yes	We consider the Amazon Helpfulness (Mc Auley et al., 2015) and Imdb Movie Review (Maas et al., 2011) tasks. We use the Cifar100 dataset (Krizhevsky et al., 2009). We use 5k training examples from the Chex Pert Dataset (Irvin et al., 2019).
Dataset Splits	Yes	The Amazon Helpfulness task splits text reviews into 115k/5k/25k documents for train-validation-test split whilst the Imdb Review dataset has a 20k/5k/25k split. For Multi Cifar100, unlike Rosenbaum et al. (2017); Yu et al. (2020) who use a 500-100 train-test split for examples under each ﬁne-grained CIFAR 100 label, we include a validation set and therefore opt for a 400-100-100 train-validation-test split. For Cat-vs-Dog, we use 100 examples from the training set as validation and test on all 1000 test examples per-class.
Hardware Specification	No	The paper does not specify the exact hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. It only mentions general terms like 'training of large neural networks'.
Software Dependencies	No	The paper mentions 'Pytorch (Paszke et al., 2017)' but does not specify a version number for it or other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For all our experiments, we select the auxiliary task control parameters ηaux within {(1.0, 1.0, 1.0), (1.0, 1.0, 0.0), (1.0, 0.0, 1.0), (1.0, 0.0, 0.0)} for ease of interpretability. For Image Classiﬁcation experiments, we perform pre-training with a learning rate of 1e-4 for all experiments and ﬁnetuning learning rate of 5e-4. We use the Adam Optimizer (Kingma & Ba, 2014) with β = (0.9, 0.999). We clip all gradient norms to 1.0 before performing gradient descent. We cross-validated dropout rates within the set {0.05, 0.1, 0.2, 0.3} for both pre-training and ﬁnetuning steps.