Introspection:Accelerating Neural Network Training By Learning Weight Evolution
Authors: Abhishek Sinha, Aahitagni Mukherjee, Mausoom Sarkar, Balaji Krishnamurthy
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use a neural network to learn the training pattern from MNIST classification and utilize it to accelerate training of neural networks used for CIFAR-10 and Image Net classification. Our method has a low memory footprint and is computationally efficient. This method can also be used with other optimizers to give faster convergence. The results indicate a general trend in the weight evolution during training of neural networks. |
| Researcher Affiliation | Collaboration | Abhishek Sinha Department of Electronics and Electrical Comm. Engg. IIT Kharagpur West Bengal, India abhishek.sinha94 at gmail dot com Mausoom Sarkar Adobe Systems Inc, Noida Uttar Pradesh,India msarkar at adobe com Aahitagni Mukherjee Department of Computer Science IIT Kanpur Uttar Pradesh, India ahitagnimukherjeeam at gmail dot com Balaji Krishnamurthy Adobe Systems Inc, Noida Uttar Pradesh,India kbalaji at adobe com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links or statements about open-sourcing code. |
| Open Datasets | Yes | The introspection network I is trained on the training history of the weights of a network N0 which was trained on MNIST dataset. |
| Dataset Splits | No | The final training loss obtained was 3.1 and the validation loss of the final trained model was 3.4. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | The introspection network I is trained on the training history of the weights of a network N0 which was trained on MNIST dataset.The network N0 consisted of 3 convolutional layers and two fully connected layers, with Re LU activation and deploying Adam optimiser. Max pooling(2X2 pool size and a 2X2 stride) was applied after the conv layers along with dropout applied after the first fc layer. The shapes of the conv layer filters were [5, 5, 1, 8] , [5, 5, 8, 16] and [5, 5, 16, 32] respectively whereas of the fc layer weight were [512, 1024] and [1024, 10] respectively.The network N0 was trained with a learning rate of 1e 4 and batch size of 50. |