Highway and Residual Networks learn Unrolled Iterative Estimation
Authors: Klaus Greff, Rupesh K. Srivastava, Jürgen Schmidhuber
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present some preliminary experiments to compare these two architectures and investigate some of their relative advantages and disadvantages. |
| Researcher Affiliation | Collaboration | Klaus Greff The Swiss AI Lab IDSIA (USI-SUPSI) Rupesh K. Srivastava & Jürgen Schmidhuber The Swiss AI Lab IDSIA (USI-SUPSI) & NNAISENSE, Lugano, Switzerland {klaus,rupesh,juergen}@idsia.ch |
| Pseudocode | No | The paper contains mathematical derivations and equations (e.g., Equation 2-16) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | To empirically test this claim, we extract the intermediate layer outputs for 5000 validation set images using the 50-layer Res Net trained on the ILSVRC-2015 dataset from He et al. (2015). We train a 50-layer convolutional Highway network based on the 50-layer Residual network from He et al. (2015). The design of the two networks are identical (including use of batch normalization (BN) after every convolution operation), except that unlike Residual blocks, the Highway blocks use two sets of layers to learn H and T and then combine them using the coupled Highway formulation. |
| Dataset Splits | Yes | The final performance of both networks on the validation set (see Table 1b) is very similar, with the Residual network producing a slightly better top-5 classification error of 7.17% vs. 7.53% for the Highway network. |
| Hardware Specification | Yes | We are grateful to NVIDIA Corporation for providing us a DGX-1 as part of the Pioneers of AI Research award. |
| Software Dependencies | No | The paper mentions using specific frameworks or setups (e.g., 'using the same setup and code provided by Kim et al. (2015)') but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x). |
| Experiment Setup | Yes | The transform gate biases are set to 1 at the start of training. For fair comparison, the number of feature maps throughout the Highway network is reduced such that the total number of parameters is close to the Residual network. The training algorithm and learning rate schedule are kept the same as those used for the Residual network. |