reproducibilityindex.ai

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

Authors: Angelos Katharopoulos, Francois Fleuret

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contribution is twofold: ﬁrst, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally, on image classiﬁcation, CNN ﬁne-tuning, and RNN training, that for a ﬁxed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%.
Researcher Affiliation	Academia	1Idiap Research Institute, Martigny, Switzerland 2EPFL, Lausanne, Switzerland. Correspondence to: Angelos Katharopoulos <ﬁrstname.lastname@idiap.ch>.
Pseudocode	Yes	Algorithm 1 Deep Learning with Importance Sampling 1: Inputs B, b, τth, aτ, θ0 2: t ← 1 3: τ ← 0 4: repeat 5: if τ > τth then 6: U ← B uniformly sampled datapoints 7: gi ← ˆGi i ∈ U according to eq 20 8: G ← b datapoints sampled with gi from U 9: wi ← 1/(Bgi) i ∈ G 10: θt ← sgd_step(wi, G, θt−1) 11: else 12: U ← b uniformly sampled datapoints 13: wi ← 1 i ∈ U 14: θt ← sgd_step(wi, U, θt−1) 15: gi ← ˆGi i ∈ U 16: end if 17: τ ← aττ + (1 − aτ) 1/(1 − 1/B P B i=1 g2 i /g¯ 2 ) 18: until convergence
Open Source Code	Yes	Experiments were conducted using Keras (Chollet et al., 2015) with Tensor Flow (Abadi et al., 2016), and the code can be found at http://github.com/idiap/ importance-sampling.
Open Datasets	Yes	Our experimental setup is as follows: we train a wide residual network (Zagoruyko & Komodakis, 2016) on the CIFAR100 dataset (Krizhevsky, 2009), following closely the training procedure of Zagoruyko & Komodakis (2016) (the details are presented in 4.2). Subsequently, we sample 1, 024 images uniformly at random from the dataset.
Dataset Splits	No	The paper mentions using CIFAR10, CIFAR100, and pixel by pixel MNIST datasets. It details batch sizes, learning rates, and training iterations but does not explicitly state a validation split percentage or count. It refers to 'test errors' and 'training loss' but no specific validation set or how it was used to tune hyperparameters for reproducibility.
Hardware Specification	Yes	For all the experiments, we use Nvidia K80 GPUs and the reported time is calculated by subtracting the timestamps before starting one epoch and after ﬁnishing one; thus it includes the time needed to transfer data between CPU and GPU memory.
Software Dependencies	No	Experiments were conducted using Keras (Chollet et al., 2015) with Tensor Flow (Abadi et al., 2016). While Keras and TensorFlow are mentioned as software used, specific version numbers are not provided.
Experiment Setup	Yes	In this section, we use importance sampling to train a residual network on CIFAR10 and CIFAR100. We follow the experimental setup of Zagoruyko & Komodakis (2016), speciﬁcally we train a wide resnet 28-2 with SGD with momentum. We use batch size 128, weight decay 0.0005, momentum 0.9, initial learning rate 0.1 divided by 5 after 20, 000 and 40, 000 parameter updates. Finally, we train for a total of 50, 000 iterations.