Born Again Neural Networks
Authors: Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with BANs based on Dense Nets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: |
| Researcher Affiliation | Collaboration | 1University of Southern California, Los Angeles, CA, USA 2Carnegie Mellon University, Pittsburgh, PA, USA 3Amazon AI, Palo Alto, CA, USA 4ETH Z urich, Z urich, Switzerland 5Caltech, Pasadena, CA, USA. |
| Pseudocode | No | The paper describes procedures using mathematical equations and textual descriptions, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | All experiments performed on CIFAR-100 use the same preprocessing and training setting as for Wide-Res Net (Zagoruyko & Komodakis, 2016b)... To validate our method beyond computer vision applications, we also apply the BAN framework to language models and evaluate it on the Penn Tree Bank (PTB) dataset (Marcus et al., 1993) |
| Dataset Splits | Yes | We consider two BAN language models: a single layer LSTM (Hochreiter & Schmidhuber, 1997) with 1500 units (Zaremba et al., 2014) and a smaller model from (Kim et al., 2016) combining a convolutional layers, highway layers, and a 2-layer LSTM (referred to as CNN-LSTM). ... using the standard train/test/validation split by (Mikolov et al., 2010). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments (e.g., 'implemented in PyTorch' is mentioned in the references, but not with a version for their own work). |
| Experiment Setup | Yes | All experiments performed on CIFAR-100 use the same preprocessing and training setting as for Wide-Res Net (Zagoruyko & Komodakis, 2016b) except for Mean-Std normalization. The only form of regularization used other than the KD loss are weight decay and, in the case of Wide Res Net drop-out. ... For the LSTM model we use weight tying (Press & Wolf, 2016), 65% dropout and train for 40 epochs using SGD with a mini-batch size of 32. An adaptive learning rate schedule is used with an initial learning rate 1 that is multiplied by a factor of 0.25 if the validation perplexity does not decrease after an epoch. |