reproducibilityindex.ai

Highway and Residual Networks learn Unrolled Iterative Estimation

Authors: Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present some preliminary experiments to compare these two architectures and investigate some of their relative advantages and disadvantages.
Researcher Affiliation	Collaboration	Klaus Greff The Swiss AI Lab IDSIA (USI-SUPSI) Rupesh K. Srivastava & Jürgen Schmidhuber The Swiss AI Lab IDSIA (USI-SUPSI) & NNAISENSE, Lugano, Switzerland {klaus,rupesh,juergen}@idsia.ch
Pseudocode	No	The paper contains mathematical derivations and equations (e.g., Equation 2-16) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	To empirically test this claim, we extract the intermediate layer outputs for 5000 validation set images using the 50-layer Res Net trained on the ILSVRC-2015 dataset from He et al. (2015). We train a 50-layer convolutional Highway network based on the 50-layer Residual network from He et al. (2015). The design of the two networks are identical (including use of batch normalization (BN) after every convolution operation), except that unlike Residual blocks, the Highway blocks use two sets of layers to learn H and T and then combine them using the coupled Highway formulation.
Dataset Splits	Yes	The final performance of both networks on the validation set (see Table 1b) is very similar, with the Residual network producing a slightly better top-5 classiﬁcation error of 7.17% vs. 7.53% for the Highway network.
Hardware Specification	Yes	We are grateful to NVIDIA Corporation for providing us a DGX-1 as part of the Pioneers of AI Research award.
Software Dependencies	No	The paper mentions using specific frameworks or setups (e.g., 'using the same setup and code provided by Kim et al. (2015)') but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup	Yes	The transform gate biases are set to 1 at the start of training. For fair comparison, the number of feature maps throughout the Highway network is reduced such that the total number of parameters is close to the Residual network. The training algorithm and learning rate schedule are kept the same as those used for the Residual network.