Curriculum By Smoothing

Authors: Samarth Sinha, Animesh Garg, Hugo Larochelle

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The generality of our method is demonstrated through empirical performance gains in CNN architectures across four different tasks: transfer learning, crosstask transfer learning, and generative models. We conduct image classification experiments using commonly-used vision datasets and CNN architecture variants to evaluate the effect of controlling smoothing feature maps during training.
Researcher Affiliation Collaboration Samarth Sinha 1, Animesh Garg 1, 2, Hugo Larochelle 3 1 University of Toronto, Vector Institute, 2 Nvidia, 2 Mila, Google Brain, CIFAR Fellow Corresponding author: samarth.sinha@mail.utoronto.ca
Pseudocode Yes A sample Py Torch-like code snippet is available in below for a two-layer CNN, to illustrate its ease of implementation [42].
Open Source Code No The code will soon be released at www.github.com/pairlab/CBS.
Open Datasets Yes For image classification we evaluate the performance of our curriculum based networks on standard vision datasets. We test our methods on CIFAR10, CIFAR100 [30] and SVHN [15]. ... Finally, to prove that our network can scale to larger datasets, we evaluate on the Image Net dataset [47].
Dataset Splits No The paper mentions using standard datasets like CIFAR10, CIFAR100, SVHN, and Image Net, but it does not explicitly specify the training, validation, or test dataset splits (e.g., percentages or sample counts) or cite where these specific splits are defined for reproducibility.
Hardware Specification Yes Finally, we would like to acknowledge Nvidia for donating DGX-1, and Vector Institute for providing resources for this research.
Software Dependencies No The paper mentions 'Py Torch-like code snippet' and cites the PyTorch paper [42], but it does not specify any software names with version numbers (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes Unless otherwise noted, for all experiments, except Image Net, we use an initial σ of 1, a σ decay rate of 0.9, and decay σ s value every 5 epochs. For Image Net we decay the value of σ two times every epoch, by the same factor, since the dataset is significantly larger in size. In all the experiments, the networks are trained using the Adam optimizer [27] with a learning rate of 10^-4 for 20 epochs.