reproducibilityindex.ai

Rigging the Lottery: Making All Tickets Winners

Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this approach requires fewer ﬂoating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103.
Researcher Affiliation	Industry	1Google Brain 2Deep Mind. Correspondence to: Utku Evci <evcu@google.com>, Erich Elsen <eriche@google.com>.
Pseudocode	Yes	Algorithm 1 RigL
Open Source Code	Yes	*Code available at github.com/google-research/rigl
Open Datasets	Yes	Our experiments include image classiﬁcation using CNNs on the ImageNet-2012 (Russakovsky et al., 2015) and CIFAR-10 (Krizhevsky, 2009) datasets and character based language modeling using RNNs with the WikiText-103 dataset (Merity et al., 2016).
Dataset Splits	No	The paper mentions training on ImageNet-2012, CIFAR-10, and WikiText-103 datasets, and refers to 'validation loss' for the language modeling task. However, it does not explicitly provide specific training/validation/test split percentages, sample counts, or direct references to how the data was partitioned for each dataset (e.g., '80/10/10 split' or 'standard splits from X').
Hardware Specification	No	The paper does not specify the hardware used for running experiments, such as particular GPU or CPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions a 'Tensorflow implementation' but does not specify the version number of TensorFlow or any other software dependencies.
Experiment Setup	Yes	For all dynamic sparse training methods (SET, SNFS, RigL), we use the same update schedule with T = 100 and = 0.3 unless stated otherwise. ... We set Tend to 25k for ImageNet 2012 and 75k for CIFAR-10 training... For ImageNet-2012: We set the momentum coefﬁcient of the optimizer to 0.9, L2 regularization coefﬁcient to 0.0001, and label smoothing to 0.1. The learning rate schedule starts with a linear warm up reaching its maximum value of 1.6 at epoch 5 which is then dropped by a factor of 10 at epochs 30, 70 and 90. We train our networks with a batch size of 4096 for 32000 steps... For CIFAR-10: The learning rate starts at 0.1 which is scaled down by a factor of 5 every 30,000 iterations. We use an L2 regularization coefﬁcient of 5e-4, a batch size of 128 and a momentum coefﬁcient of 0.9.