Learning Implicitly Recurrent CNNs Through Parameter Sharing

Authors: Pedro Savarese, Michael Maire

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate substantial parameter savings on standard image classification tasks, while maintaining accuracy. ... Table 1: Test error (%) on CIFAR-10 and CIFAR100. ... Table 3: Image Net classification results
Researcher Affiliation Academia Pedro Savarese TTI-Chicago savarese@ttic.edu Michael Maire University of Chicago mmaire@uchicago.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/lolemacs/soft-sharing
Open Datasets Yes The CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009) are composed of 60, 000 colored 32 32 images, labeled among 10 and 100 classes respectively, and split into 50, 000 and 10, 000 examples for training and testing. We use the ILSVRC 2012 dataset (Russakovsky et al., 2015) as a stronger test of our method. It is composed of 1.2M training and 50, 000 validation images, drawn from 1000 classes.
Dataset Splits Yes The CIFAR-10 and CIFAR-100 datasets ... split into 50, 000 and 10, 000 examples for training and testing. We use the ILSVRC 2012 dataset (Russakovsky et al., 2015) ... It is composed of 1.2M training and 50, 000 validation images, drawn from 1000 classes.
Hardware Specification Yes We achieve 2.69% test error after training less than 10 hours on a single NVIDIA GTX 1080 Ti.
Software Dependencies No The paper mentions using Adam optimizer and SGD, but does not provide specific version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Following Zagoruyko & Komodakis (2016), we train each model for 200 epochs with SGD and Nesterov momentum of 0.9 and a batch size of 128. The learning rate is initially set to 0.1 and decays by a factor of 5 at epochs 60, 120 and 160. We also apply weight decay of 5 10 4 on all parameters except for the coefficients α. ... Each model trains for 50 epochs per phase with Adam (Kingma & Ba, 2015) and a fixed learning rate of 0.01.