Training Deep Models Faster with Robust, Approximate Importance Sampling
Authors: Tyler B. Johnson, Carlos Guestrin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes much more. 6 Empirical comparisons In this section, we demonstrate how RAIS performs in practice. We consider the very popular task of training a convolutional neural network to classify images. |
| Researcher Affiliation | Academia | Tyler B. Johnson University of Washington, Seattle tbjohns@washington.edu Carlos Guestrin University of Washington, Seattle guestrin@cs.washington.edu |
| Pseudocode | Yes | Algorithm 4.1 RAIS-SGD |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code or providing a link to an open-source repository for the described methodology. |
| Open Datasets | Yes | For our remaining comparisons, we consider street view house numbers [25], rotated MNIST [26], and CIFAR tiny image [27] datasets. |
| Dataset Splits | No | The paper states it uses validation performance and lists the total number of training examples for each dataset, but does not explicitly provide specific train/validation/test split percentages or sample counts, nor does it cite the exact split methodology. |
| Hardware Specification | No | The paper mentions 'an isolated machine' for one experiment but does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.y', 'CUDA z.a'). |
| Experiment Setup | Yes | We use learning rate (t) = 3.4/p100 + t, L2 penalty λ = 2.5 10 4, and batch size 32... We use batch normalization and standard momentum of 0.9. For rot-MNIST, we follow [28], augmenting data with random rotations and training with dropout. For the CIFAR problems, we augment the training set with random horizontal reflections and random crops (pad to 40x40 pixels; crop to 32x32). We train the SVHN model with batch size 64 and the remaining models with |M| = 128... The learning rate schedule decreases by a fixed fraction after each epoch... This fraction is 0.8 for SVHN, 0.972 for rot-MNIST, 0.96 for CIFAR-10, and 0.96 for CIFAR-100. The initial learning rates are 0.15, 0.09, 0.08, and 0.1, respectively. We use λ = 3 10 3 for rot-MNIST and λ = 5 10 4 otherwise. |