reproducibilityindex.ai

TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning

Authors: Han Cai, Chuang Gan, Ligeng Zhu, Song Han

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 9 image classiﬁcation datasets with the same pre-trained model (Proxyless NAS-Mobile [11]) demonstrate the effectiveness of Tiny TL compared to previous transfer learning methods.
Researcher Affiliation	Collaboration	Han Cai1, Chuang Gan2, Ligeng Zhu1, Song Han1 1Massachusetts Institute of Technology, 2MIT-IBM Watson AI Lab
Pseudocode	No	The paper includes equations and architectural diagrams but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide explicit statements or links to the open-source code for the described methodology. The link 'http://tinyml.mit.edu/' is a project website, not a direct code repository.
Open Datasets	Yes	Following the common practice [43,44,45], we use Image Net [35] as the pre-training dataset, and then transfer the models to 8 downstream object classiﬁcation tasks, including Cars [41], Flowers [51], Aircraft [40], CUB [52], Pets [53], Food [54], CIFAR10 [55], and CIFAR100 [55]. Besides object classiﬁcation, we also evaluate our Tiny TL on human facial attribute classiﬁcation tasks, where Celeb A [56] is the transfer dataset and VGGFace2 [57] is the pre-training dataset.
Dataset Splits	No	The paper describes the datasets used and reports accuracy on them, implying standard splits, but it does not explicitly provide specific percentages, sample counts for train/validation/test, or cite the source for these splits within the paper's text.
Hardware Specification	No	The paper states that models were fine-tuned 'on a single GPU' but does not specify the model of the GPU or any other hardware components (e.g., CPU, RAM) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Py Torch' but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup	Yes	The models are ﬁne-tuned for 50 epochs using the Adam optimizer [60] with batch size 8 on a single GPU. The initial learning rate is tuned for each dataset while cosine schedule [61] is adopted for learning rate decay. For each MB-block in Proxyless NAS-Mobile, we insert a lite residual module... The group number is 2, and the kernel size is 5. We use the Re LU activation... We replace all BN layers with GN layers... We set the number of channels per group to 8 for all GN layers.