Residual Connections Encourage Iterative Inference

Authors: Stanisław Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study Resnets both analytically and empirically. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement.
Researcher Affiliation Collaboration Stanisław Jastrz ebski1,2, , Devansh Arpit2, , Nicolas Ballas3, Vikas Verma5, Tong Che2 & Yoshua Bengio2,6 1 Jagiellonian University, Cracow, Poland 2 MILA, Université de Montréal, Canada 3 Facebook, Montreal, Canada 4 University of Bonn, Bonn, Germany 5 Aalto University, Finland 6 CIFAR Senior Fellow
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Experiments are performed on CIFAR-10 (Krizhevsky & Hinton, 2009) and CIFAR-100 (see appendix)
Dataset Splits No The paper uses CIFAR-10 and CIFAR-100 datasets, and mentions 'validation sets' in its empirical analysis (e.g., Figures 2, 3, 4 showing 'train validation' curves, and Section 4.2 discussing 'validation accuracy'). However, it does not explicitly state the specific percentages or methods used for generating the training, validation, and test splits, nor does it explicitly reference that it uses a predefined standard split for these datasets.
Hardware Specification No The paper only states 'We acknowledge the computing resources provided by Compute Canada and Calcul Quebec,' which does not specify any particular hardware components (e.g., GPU model, CPU model, memory details) used for the experiments.
Software Dependencies No The paper describes various components and training procedures, such as 'Batch Norm Re LU Conv' and 'SGD with momentum 0.9', but it does not specify any version numbers for the software libraries, frameworks, or programming languages used (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes Experimental details: For all architectures, we use He-normal weight initialization as suggested in He et al. (2015), and biases are initialized to 0. For residual blocks, we use Batch Norm Re LU Conv Batch Norm Re LU Conv as suggested in He et al. (2016b). The classifier is composed of the following elements: Batch Norm Re LU Average Pool(8,8) Flatten Fully-Connected-Layer(#classes) Softmax. ... For all experiments for single representation and pooling Resnet architectures, we use SGD with momentum 0.9 and train for 200 epochs and 100 epochs (respectively) with learning rate 0.1 until epoch 40, 0.02 until 60, 0.004 until 80 and 0.0008 afterwards. For the original Resnet we use SGD with momentum 0.9 and train for 300 epochs with learning rate 0.1 until epoch 80, 0.01 until 120, 0.001 until 200, 0.00001 until 240 and 0.000011 afterwards. We use data augmentation (horizontal flipping and translation) during training of all architectures.