Born Again Neural Networks

Authors: Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments with BANs based on Dense Nets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives:
Researcher Affiliation Collaboration 1University of Southern California, Los Angeles, CA, USA 2Carnegie Mellon University, Pittsburgh, PA, USA 3Amazon AI, Palo Alto, CA, USA 4ETH Z urich, Z urich, Switzerland 5Caltech, Pasadena, CA, USA.
Pseudocode No The paper describes procedures using mathematical equations and textual descriptions, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes All experiments performed on CIFAR-100 use the same preprocessing and training setting as for Wide-Res Net (Zagoruyko & Komodakis, 2016b)... To validate our method beyond computer vision applications, we also apply the BAN framework to language models and evaluate it on the Penn Tree Bank (PTB) dataset (Marcus et al., 1993)
Dataset Splits Yes We consider two BAN language models: a single layer LSTM (Hochreiter & Schmidhuber, 1997) with 1500 units (Zaremba et al., 2014) and a smaller model from (Kim et al., 2016) combining a convolutional layers, highway layers, and a 2-layer LSTM (referred to as CNN-LSTM). ... using the standard train/test/validation split by (Mikolov et al., 2010).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments (e.g., 'implemented in PyTorch' is mentioned in the references, but not with a version for their own work).
Experiment Setup Yes All experiments performed on CIFAR-100 use the same preprocessing and training setting as for Wide-Res Net (Zagoruyko & Komodakis, 2016b) except for Mean-Std normalization. The only form of regularization used other than the KD loss are weight decay and, in the case of Wide Res Net drop-out. ... For the LSTM model we use weight tying (Press & Wolf, 2016), 65% dropout and train for 40 epochs using SGD with a mini-batch size of 32. An adaptive learning rate schedule is used with an initial learning rate 1 that is multiplied by a factor of 0.25 if the validation perplexity does not decrease after an epoch.