Curriculum By Smoothing
Authors: Samarth Sinha, Animesh Garg, Hugo Larochelle
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The generality of our method is demonstrated through empirical performance gains in CNN architectures across four different tasks: transfer learning, crosstask transfer learning, and generative models. We conduct image classification experiments using commonly-used vision datasets and CNN architecture variants to evaluate the effect of controlling smoothing feature maps during training. |
| Researcher Affiliation | Collaboration | Samarth Sinha 1, Animesh Garg 1, 2, Hugo Larochelle 3 1 University of Toronto, Vector Institute, 2 Nvidia, 2 Mila, Google Brain, CIFAR Fellow Corresponding author: samarth.sinha@mail.utoronto.ca |
| Pseudocode | Yes | A sample Py Torch-like code snippet is available in below for a two-layer CNN, to illustrate its ease of implementation [42]. |
| Open Source Code | No | The code will soon be released at www.github.com/pairlab/CBS. |
| Open Datasets | Yes | For image classification we evaluate the performance of our curriculum based networks on standard vision datasets. We test our methods on CIFAR10, CIFAR100 [30] and SVHN [15]. ... Finally, to prove that our network can scale to larger datasets, we evaluate on the Image Net dataset [47]. |
| Dataset Splits | No | The paper mentions using standard datasets like CIFAR10, CIFAR100, SVHN, and Image Net, but it does not explicitly specify the training, validation, or test dataset splits (e.g., percentages or sample counts) or cite where these specific splits are defined for reproducibility. |
| Hardware Specification | Yes | Finally, we would like to acknowledge Nvidia for donating DGX-1, and Vector Institute for providing resources for this research. |
| Software Dependencies | No | The paper mentions 'Py Torch-like code snippet' and cites the PyTorch paper [42], but it does not specify any software names with version numbers (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | Unless otherwise noted, for all experiments, except Image Net, we use an initial σ of 1, a σ decay rate of 0.9, and decay σ s value every 5 epochs. For Image Net we decay the value of σ two times every epoch, by the same factor, since the dataset is significantly larger in size. In all the experiments, the networks are trained using the Adam optimizer [27] with a learning rate of 10^-4 for 20 epochs. |