reproducibilityindex.ai

Training Deep Models Faster with Robust, Approximate Importance Sampling

Authors: Tyler B. Johnson, Carlos Guestrin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we ﬁnd RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes much more. 6 Empirical comparisons In this section, we demonstrate how RAIS performs in practice. We consider the very popular task of training a convolutional neural network to classify images.
Researcher Affiliation	Academia	Tyler B. Johnson University of Washington, Seattle tbjohns@washington.edu Carlos Guestrin University of Washington, Seattle guestrin@cs.washington.edu
Pseudocode	Yes	Algorithm 4.1 RAIS-SGD
Open Source Code	No	The paper does not include any explicit statements about releasing source code or providing a link to an open-source repository for the described methodology.
Open Datasets	Yes	For our remaining comparisons, we consider street view house numbers [25], rotated MNIST [26], and CIFAR tiny image [27] datasets.
Dataset Splits	No	The paper states it uses validation performance and lists the total number of training examples for each dataset, but does not explicitly provide specific train/validation/test split percentages or sample counts, nor does it cite the exact split methodology.
Hardware Specification	No	The paper mentions 'an isolated machine' for one experiment but does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.y', 'CUDA z.a').
Experiment Setup	Yes	We use learning rate (t) = 3.4/p100 + t, L2 penalty λ = 2.5 10 4, and batch size 32... We use batch normalization and standard momentum of 0.9. For rot-MNIST, we follow [28], augmenting data with random rotations and training with dropout. For the CIFAR problems, we augment the training set with random horizontal reﬂections and random crops (pad to 40x40 pixels; crop to 32x32). We train the SVHN model with batch size 64 and the remaining models with \|M\| = 128... The learning rate schedule decreases by a ﬁxed fraction after each epoch... This fraction is 0.8 for SVHN, 0.972 for rot-MNIST, 0.96 for CIFAR-10, and 0.96 for CIFAR-100. The initial learning rates are 0.15, 0.09, 0.08, and 0.1, respectively. We use λ = 3 10 3 for rot-MNIST and λ = 5 10 4 otherwise.