reproducibilityindex.ai

Introspection:Accelerating Neural Network Training By Learning Weight Evolution

Authors: Abhishek Sinha, Aahitagni Mukherjee, Mausoom Sarkar, Balaji Krishnamurthy

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use a neural network to learn the training pattern from MNIST classiﬁcation and utilize it to accelerate training of neural networks used for CIFAR-10 and Image Net classiﬁcation. Our method has a low memory footprint and is computationally efﬁcient. This method can also be used with other optimizers to give faster convergence. The results indicate a general trend in the weight evolution during training of neural networks.
Researcher Affiliation	Collaboration	Abhishek Sinha Department of Electronics and Electrical Comm. Engg. IIT Kharagpur West Bengal, India abhishek.sinha94 at gmail dot com Mausoom Sarkar Adobe Systems Inc, Noida Uttar Pradesh,India msarkar at adobe com Aahitagni Mukherjee Department of Computer Science IIT Kanpur Uttar Pradesh, India ahitagnimukherjeeam at gmail dot com Balaji Krishnamurthy Adobe Systems Inc, Noida Uttar Pradesh,India kbalaji at adobe com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any links or statements about open-sourcing code.
Open Datasets	Yes	The introspection network I is trained on the training history of the weights of a network N0 which was trained on MNIST dataset.
Dataset Splits	No	The ﬁnal training loss obtained was 3.1 and the validation loss of the ﬁnal trained model was 3.4.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	The introspection network I is trained on the training history of the weights of a network N0 which was trained on MNIST dataset.The network N0 consisted of 3 convolutional layers and two fully connected layers, with Re LU activation and deploying Adam optimiser. Max pooling(2X2 pool size and a 2X2 stride) was applied after the conv layers along with dropout applied after the ﬁrst fc layer. The shapes of the conv layer ﬁlters were [5, 5, 1, 8] , [5, 5, 8, 16] and [5, 5, 16, 32] respectively whereas of the fc layer weight were [512, 1024] and [1024, 10] respectively.The network N0 was trained with a learning rate of 1e 4 and batch size of 50.