Highway and Residual Networks learn Unrolled Iterative Estimation

Authors: Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present some preliminary experiments to compare these two architectures and investigate some of their relative advantages and disadvantages.
Researcher Affiliation Collaboration Klaus Greff The Swiss AI Lab IDSIA (USI-SUPSI) Rupesh K. Srivastava & Jürgen Schmidhuber The Swiss AI Lab IDSIA (USI-SUPSI) & NNAISENSE, Lugano, Switzerland {klaus,rupesh,juergen}@idsia.ch
Pseudocode No The paper contains mathematical derivations and equations (e.g., Equation 2-16) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes To empirically test this claim, we extract the intermediate layer outputs for 5000 validation set images using the 50-layer Res Net trained on the ILSVRC-2015 dataset from He et al. (2015). We train a 50-layer convolutional Highway network based on the 50-layer Residual network from He et al. (2015). The design of the two networks are identical (including use of batch normalization (BN) after every convolution operation), except that unlike Residual blocks, the Highway blocks use two sets of layers to learn H and T and then combine them using the coupled Highway formulation.
Dataset Splits Yes The final performance of both networks on the validation set (see Table 1b) is very similar, with the Residual network producing a slightly better top-5 classification error of 7.17% vs. 7.53% for the Highway network.
Hardware Specification Yes We are grateful to NVIDIA Corporation for providing us a DGX-1 as part of the Pioneers of AI Research award.
Software Dependencies No The paper mentions using specific frameworks or setups (e.g., 'using the same setup and code provided by Kim et al. (2015)') but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup Yes The transform gate biases are set to 1 at the start of training. For fair comparison, the number of feature maps throughout the Highway network is reduced such that the total number of parameters is close to the Residual network. The training algorithm and learning rate schedule are kept the same as those used for the Residual network.