reproducibilityindex.ai

What is the Effect of Importance Weighting in Deep Learning?

Authors: Jonathon Byrd, Zachary Lipton

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally conﬁrm these ﬁndings across a range of architectures and datasets.
Researcher Affiliation	Academia	Jonathon Byrd 1 Zachary C. Lipton 1 1Carnegie Mellon University.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the release of its source code.
Open Datasets	Yes	We investigate the effects of importance weighting on neural networks on two-dimensional toy datasets, the CIFAR-10 image dataset, and the Microsoft Research Paraphrase Corpus (MRPC) text dataset. Here, we train a binary classiﬁer on training images labeled as cats or dogs (5000 per class)... We conduct similar experiments on (sequential) natural language data using the Microsoft Research Paraphrase Corpus (MRPC) (Dolan & Brockett, 2005).
Dataset Splits	No	The paper mentions training and testing data but does not explicitly describe a separate validation split or how it was handled for reproducibility.
Hardware Specification	No	The paper mentions training models but does not specify any particular hardware used for experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using the Adam optimizer (Kingma & Ba, 2015) and fine-tuning the BERTBASE model (Devlin et al., 2018), and adapting from Wolf & Sanh (2018), but does not provide specific version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For L2 regularization, we set the penalty coefﬁcient as 0.001, and when using dropout on deep networks, we set the values of hidden units to 0 during training with probability 1. The models are trained for 1000 epochs using minibatch SGD with a batch size of 16 and no momentum. All models trained with SGD use a constant learning rate of 0.1, except for the dropout models with no importance weighting which used a learning rate of 0.05 due to weight divergence issues. We also ran experiments with the Adam optimizer (Kingma & Ba, 2015) with learning rate 1e 4, β1 = 0.9, β2 = 0.999, and ϵ = 1e 8 (Figure A.9).