Decoupling the Layers in Residual Networks

Authors: Ricky Fok, Aijun An, Zana Rashidi, Xiaogang Wang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through an extensive performance study that the proposed network achieves comparable predictive performance to the original residual network with the same number of parameters, while achieving a significant speed-up on the total training time.
Researcher Affiliation Academia Ricky Fok , Aijun An, Zana Rashidi Department of Electrical Engineering and Computer Science York University 4700 Keele Street, Toronto, M3J 1P3, Canada ricky.fok3@gmail.com, aan@cse.yorku.ca, zrashidi@cse.yorku.ca Xiaogang Wang Department of Mathematics and Statistics York University 4700 Keele Street, Toronto, M3J 1P3, Canada stevenw@mathstat.yorku.ca
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper only acknowledges third-party code: 'We thank Wenxin Xu for providing his code for Res Net at https://github.com/wenxinxu/resnet_in_tensorflow.' It does not provide concrete access to the authors' own source code for the methodology described.
Open Datasets Yes For the CIFAR-10 and CIFAR-100 data sets, we trained for 80000 iterations, or 204 epochs. We also tested Warp Net on a down-sampled (32x32) Image Net data set (Chrabaszcz & Hutter, 2017).
Dataset Splits Yes For the CIFAR-10 and CIFAR-100 data sets, we trained for 80000 iterations, or 204 epochs. We took a training batch size of 128. Initial learning rate is 0.1. The learning rate drops by a factor of 0.1 at epochs 60, 120, and 160, with a weight decay of 0.0005. The data set contains 1000 classes with 1281167 training images and 50000 validation images with 50 images each class.
Hardware Specification No The paper mentions 'GPUs' and memory constraints ('requires too much memory on a single GPU') but does not specify exact GPU models, CPU models, or other detailed computer specifications used for experiments.
Software Dependencies No The paper mentions using 'Tensorflow' for implementation but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes For the CIFAR-10 and CIFAR-100 data sets, we trained for 80000 iterations, or 204 epochs. We took a training batch size of 128. Initial learning rate is 0.1. The learning rate drops by a factor of 0.1 at epochs 60, 120, and 160, with a weight decay of 0.0005. The training batch size is 512, initial learning rate is 0.4 and drops by a factor of 0.1 at every 30 epochs. The weight decay is set to be 0.0001.