reproducibilityindex.ai

Jointly-Learned Exit and Inference for a Dynamic Neural Network

Authors: florence regol, Joud Chataoui, Mark Coates

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use the vision transformers T2T-Vi T-7 and T2T-Vi T-14 (Yuan et al., 2021) pretrained on the Image Net dataset (Deng et al., 2009) which we then transfer-learn to the datasets: CIFAR10, CIFAR100, CIFAR100-LT (Krizhevsky, 2009) and SVHN (Netzer et al., 2011)." and "Figure 2 depicts performance versus inference cost curves. Appendix 9.8 presents results on additional datasets and architectures." and "Appendix 9.6 provides an ablation study which demonstrates the value of learnable gates and joint training.
Researcher Affiliation	Academia	Florence Regol , Joud Chataoui & Mark Coates Mc Gill University, International Laboratory on Learning Systems (ILLS), Mila Qu ebec AI Institute As {florence.robert-regol, joud.chataoui}@mail.mcgill.ca mark.coates@mcgill.ca
Pseudocode	Yes	Algorithm 1 in Appendix 9.5 provides a detailed exposition of the entire algorithm." and "Algorithm 1 Training" in Appendix 9.5.
Open Source Code	Yes	1Code to reproduce our experiments is available at our Github repository
Open Datasets	Yes	We use the vision transformers T2T-Vi T-7 and T2T-Vi T-14 (Yuan et al., 2021) pretrained on the Image Net dataset (Deng et al., 2009) which we then transfer-learn to the datasets: CIFAR10, CIFAR100, CIFAR100-LT (Krizhevsky, 2009) and SVHN (Netzer et al., 2011).
Dataset Splits	Yes	The CIFAR10 and CIFAR100 (Krizhevsky, 2009) datasets both consist of 60,000 32 32 coloured images. ... We follow a 75%-8.3%-16.6% train-validation-test split." and "Image Net (Deng et al., 2009) consists of 1.2 million training images spanning 1,000 classes. We reserved 50,000 of these images to be used as validation set and used another 50,000 images as a test set." and "We use a 68.6%-5%-26.2% train-validation-test split." for SVHN.
Hardware Specification	No	We report the average training time on a GPUx machine and number of parameters used for both architectures in Table 1. Overall, our approach takes a little longer to train and has a negligible number of additional parameters.
Software Dependencies	No	We use Meta s FAIR fvcore library for computing Mul-Adds. We obtain results that match the Mul-Adds reported in Yuan et al. (2021) for the T2T-Vi T architecture." (No specific versions for fvcore or any other relevant library/framework are given).
Experiment Setup	Yes	We use the Adam optimizer with a learning rate of 0.01 with a weight decay of 5e 4. We use a batch size of 64 for CIFAR10, CIFAR100 and CIFAR100LT, and a batch size of 256 for SVHN. We train until convergence using early stopping with a maximum of 15 epochs (E) for CIFAR10 and SVHN, 20 for CIFAR100LT and 30 for CIFAR100LT. For the number of warmup epochs WE, we perform a hyperparameter search over the range {1,...,E/2}." and "Following the data transform of Han et al. (2023b), the images are cropped to be of size 224 224.