reproducibilityindex.ai

Identity Matters in Deep Learning

Authors: Moritz Hardt, Tengyu Ma

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we put the principle of identity parameterization on a more solid theoretical footing alongside further empirical progress. ... Directly inspired by our theory, we experiment with a radically simple residual architecture consisting of only residual convolutional layers and Re Lu activations, but no batch normalization, dropout, or max pool. Our model improves signiﬁcantly on previous all-convolutional networks on the CIFAR10, CIFAR100, and Image Net classiﬁcation benchmarks.
Researcher Affiliation	Collaboration	Moritz Hardt Google Brain 1600 Amphitheatre Parkway, Mountain View, CA, 94043 m@mrtz.org Tengyu Ma Department of Computer Sciene Princeton University 35 Olden Street, Princeton, 08540 tengyu@cs.princeton.edu
Pseudocode	No	No pseudocode or algorithm block found. The paper primarily uses mathematical equations and descriptive text.
Open Source Code	No	Our code can be easily derived from an open source implementation3 by removing batch normalization, adjusting the residual components and model architecture. 3https://github.com/tensorflow/models/tree/master/resnet
Open Datasets	Yes	Our model improves signiﬁcantly on previous all-convolutional networks on the CIFAR10, CIFAR100, and Image Net classiﬁcation benchmarks. ... Inspired by our theory, we experimented with all-convolutional residual networks on standard image classiﬁcation benchmarks. 4.1 CIFAR10 AND CIFAR100 ... The Image Net ILSVRC 2012 data set has 1, 281, 167 data points with 1000 classes.
Dataset Splits	No	The Image Net ILSVRC 2012 data set has 1, 281, 167 data points with 1000 classes. ... Our model still reached 35.29% top-1 classiﬁcation error on the test set (50000 data points)... An interesting aspect of our model is that despite its massive size of 13.59 million trainable parameters, the model does not seem to overﬁt too quickly even though the data set size is 50000. In contrast, we found it difﬁcult to train a model with batch normalization of this size without signiﬁcant overﬁtting on CIFAR10.
Hardware Specification	Yes	Our model reaches peak performance at around 50k steps, which takes about 24h on a single NVIDIA Tesla K40 GPU. ... Training was distributed across 6 machines updating asynchronously. Each machine was equipped with 8 GPUs (NVIDIA Tesla K40) and used batch size 256 split across the 8 GPUs so that each GPU updated with batches of size 32.
Software Dependencies	No	We trained our models with the Tensorﬂow framework, using a momentum optimizer with momentum 0.9, and batch size is 128.
Experiment Setup	Yes	We trained our models with the Tensorﬂow framework, using a momentum optimizer with momentum 0.9, and batch size is 128. All convolutional weights are trained with weight decay 0.0001. The initial learning rate is 0.05, which drops by a factor 10 and 30000 and 50000 steps. ... We trained the model with a momentum optimizer (with momentum 0.9) and a learning rate schedule that decays by a factor of 0.94 every two epochs, starting from the initial learning rate 0.1. ... used batch size 256 split across the 8 GPUs so that each GPU updated with batches of size 32.