TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning

Authors: Han Cai, Chuang Gan, Ligeng Zhu, Song Han

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 9 image classification datasets with the same pre-trained model (Proxyless NAS-Mobile [11]) demonstrate the effectiveness of Tiny TL compared to previous transfer learning methods.
Researcher Affiliation Collaboration Han Cai1, Chuang Gan2, Ligeng Zhu1, Song Han1 1Massachusetts Institute of Technology, 2MIT-IBM Watson AI Lab
Pseudocode No The paper includes equations and architectural diagrams but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit statements or links to the open-source code for the described methodology. The link 'http://tinyml.mit.edu/' is a project website, not a direct code repository.
Open Datasets Yes Following the common practice [43,44,45], we use Image Net [35] as the pre-training dataset, and then transfer the models to 8 downstream object classification tasks, including Cars [41], Flowers [51], Aircraft [40], CUB [52], Pets [53], Food [54], CIFAR10 [55], and CIFAR100 [55]. Besides object classification, we also evaluate our Tiny TL on human facial attribute classification tasks, where Celeb A [56] is the transfer dataset and VGGFace2 [57] is the pre-training dataset.
Dataset Splits No The paper describes the datasets used and reports accuracy on them, implying standard splits, but it does not explicitly provide specific percentages, sample counts for train/validation/test, or cite the source for these splits within the paper's text.
Hardware Specification No The paper states that models were fine-tuned 'on a single GPU' but does not specify the model of the GPU or any other hardware components (e.g., CPU, RAM) used for running the experiments.
Software Dependencies No The paper mentions using 'Py Torch' but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes The models are fine-tuned for 50 epochs using the Adam optimizer [60] with batch size 8 on a single GPU. The initial learning rate is tuned for each dataset while cosine schedule [61] is adopted for learning rate decay. For each MB-block in Proxyless NAS-Mobile, we insert a lite residual module... The group number is 2, and the kernel size is 5. We use the Re LU activation... We replace all BN layers with GN layers... We set the number of channels per group to 8 for all GN layers.