Jointly-Learned Exit and Inference for a Dynamic Neural Network
Authors: florence regol, Joud Chataoui, Mark Coates
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use the vision transformers T2T-Vi T-7 and T2T-Vi T-14 (Yuan et al., 2021) pretrained on the Image Net dataset (Deng et al., 2009) which we then transfer-learn to the datasets: CIFAR10, CIFAR100, CIFAR100-LT (Krizhevsky, 2009) and SVHN (Netzer et al., 2011)." and "Figure 2 depicts performance versus inference cost curves. Appendix 9.8 presents results on additional datasets and architectures." and "Appendix 9.6 provides an ablation study which demonstrates the value of learnable gates and joint training. |
| Researcher Affiliation | Academia | Florence Regol , Joud Chataoui & Mark Coates Mc Gill University, International Laboratory on Learning Systems (ILLS), Mila Qu ebec AI Institute As {florence.robert-regol, joud.chataoui}@mail.mcgill.ca mark.coates@mcgill.ca |
| Pseudocode | Yes | Algorithm 1 in Appendix 9.5 provides a detailed exposition of the entire algorithm." and "Algorithm 1 Training" in Appendix 9.5. |
| Open Source Code | Yes | 1Code to reproduce our experiments is available at our Github repository |
| Open Datasets | Yes | We use the vision transformers T2T-Vi T-7 and T2T-Vi T-14 (Yuan et al., 2021) pretrained on the Image Net dataset (Deng et al., 2009) which we then transfer-learn to the datasets: CIFAR10, CIFAR100, CIFAR100-LT (Krizhevsky, 2009) and SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | The CIFAR10 and CIFAR100 (Krizhevsky, 2009) datasets both consist of 60,000 32 32 coloured images. ... We follow a 75%-8.3%-16.6% train-validation-test split." and "Image Net (Deng et al., 2009) consists of 1.2 million training images spanning 1,000 classes. We reserved 50,000 of these images to be used as validation set and used another 50,000 images as a test set." and "We use a 68.6%-5%-26.2% train-validation-test split." for SVHN. |
| Hardware Specification | No | We report the average training time on a GPUx machine and number of parameters used for both architectures in Table 1. Overall, our approach takes a little longer to train and has a negligible number of additional parameters. |
| Software Dependencies | No | We use Meta s FAIR fvcore library for computing Mul-Adds. We obtain results that match the Mul-Adds reported in Yuan et al. (2021) for the T2T-Vi T architecture." (No specific versions for fvcore or any other relevant library/framework are given). |
| Experiment Setup | Yes | We use the Adam optimizer with a learning rate of 0.01 with a weight decay of 5e 4. We use a batch size of 64 for CIFAR10, CIFAR100 and CIFAR100LT, and a batch size of 256 for SVHN. We train until convergence using early stopping with a maximum of 15 epochs (E) for CIFAR10 and SVHN, 20 for CIFAR100LT and 30 for CIFAR100LT. For the number of warmup epochs WE, we perform a hyperparameter search over the range {1,...,E/2}." and "Following the data transform of Han et al. (2023b), the images are cropped to be of size 224 224. |