Net2Net: Accelerating Learning via Knowledge Transfer

Authors: Tianqi Chen, Ian Goodfellow, Jon Shlens

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our Net2Net operators in three different settings. In all cases we used an Inception BN network (Ioffe & Szegedy, 2015) trained on Image Net. In the first setting, we demonstrate that Net2Wider Net can be used to accelerate the training of a standard Inception network by initializing it with a smaller network. In the second setting, we demonstrate that Net2Deeper Net allows us to increase the depth of the Inception modules. Finally, we use our Net2Net operators in a realistic setting, where we make more dramatic changes to the model size and explore the model space for better performance. In this setting, we demonstrate an improved result on Image Net.
Researcher Affiliation Collaboration Tianqi Chen , Ian Goodfellow, and Jonathon Shlens Google Inc., Mountain View, CA tqchen@cs.washington.edu, {goodfellow,shlens}@google.com
Pseudocode Yes Algorithm 1: Net2Wider Net
Open Source Code No The paper mentions using Tensor Flow: "We would like to thank the developers of Tensor Flow (Abadi et al., 2015), which we used for all of our experiments." It provides a URL for Tensor Flow but does not state that the authors are releasing their own code for the described methodology.
Open Datasets Yes In all cases we used an Inception BN network (Ioffe & Szegedy, 2015) trained on Image Net.
Dataset Splits Yes Accuracy on Validation Set (from Figure 4, 5, 6 titles) and This last approach paid off, yielding a model that sets a new state of the art of 78.5% on our Image Net validation set.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "Tensor Flow (Abadi et al., 2015)" but does not provide a specific version number for it or any other ancillary software components.
Experiment Setup Yes We found that the initial learning rate for the student network should be approximately 1 10 the initial learning rate for the teacher network.