Net2Net: Accelerating Learning via Knowledge Transfer
Authors: Tianqi Chen, Ian Goodfellow, Jon Shlens
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our Net2Net operators in three different settings. In all cases we used an Inception BN network (Ioffe & Szegedy, 2015) trained on Image Net. In the first setting, we demonstrate that Net2Wider Net can be used to accelerate the training of a standard Inception network by initializing it with a smaller network. In the second setting, we demonstrate that Net2Deeper Net allows us to increase the depth of the Inception modules. Finally, we use our Net2Net operators in a realistic setting, where we make more dramatic changes to the model size and explore the model space for better performance. In this setting, we demonstrate an improved result on Image Net. |
| Researcher Affiliation | Collaboration | Tianqi Chen , Ian Goodfellow, and Jonathon Shlens Google Inc., Mountain View, CA tqchen@cs.washington.edu, {goodfellow,shlens}@google.com |
| Pseudocode | Yes | Algorithm 1: Net2Wider Net |
| Open Source Code | No | The paper mentions using Tensor Flow: "We would like to thank the developers of Tensor Flow (Abadi et al., 2015), which we used for all of our experiments." It provides a URL for Tensor Flow but does not state that the authors are releasing their own code for the described methodology. |
| Open Datasets | Yes | In all cases we used an Inception BN network (Ioffe & Szegedy, 2015) trained on Image Net. |
| Dataset Splits | Yes | Accuracy on Validation Set (from Figure 4, 5, 6 titles) and This last approach paid off, yielding a model that sets a new state of the art of 78.5% on our Image Net validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Tensor Flow (Abadi et al., 2015)" but does not provide a specific version number for it or any other ancillary software components. |
| Experiment Setup | Yes | We found that the initial learning rate for the student network should be approximately 1 10 the initial learning rate for the teacher network. |