TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

Authors: Young D. Kwon, Rui Li, Stylianos Venieris, Jagmohan Chauhan, Nicholas Donald Lane, Cecilia Mascolo

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Tiny Train outperforms vanilla finetuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 1,098 and 7.68 , respectively. Targeting broadly used realworld edge devices, Tiny Train achieves 9.5 faster and 3.5 more energy-efficient training over status-quo approaches, and 2.23 smaller memory footprint than SOTA methods, while remaining within the 1 MB memory envelope of MCU-grade platforms. ... Table 1 summarises accuracy results of Tiny Train and various baselines after adapting to cross-domain target datasets, averaged over 200 runs.
Researcher Affiliation Collaboration 1Department of Computer Science and Technology, University of Cambridge, United Kingdom 2Samsung AI Center, Cambridge, United Kingdom 3School of Electronics and Computer Science, University of Southampton, United Kingdom.
Pseudocode Yes Algorithm 1 Online learning stage of Tiny Train
Open Source Code Yes Code is available at https://github.com/theyoungkwon/TinyTrain
Open Datasets Yes We use Mini Image Net (Vinyals et al., 2016) as the meta-train dataset, following the same setting as prior works on cross-domain FSL (Hu et al., 2022; Triantafillou et al., 2020). For meta-test datasets (i.e. target datasets of different domains than the source dataset of Mini Image Net), we employ all nine out-of-domain datasets of various domains from Meta-Dataset (Triantafillou et al., 2020).
Dataset Splits Yes Specifically, Mini Image Net contains 100 classes from Image Net-1k, split into 64 training, 16 validation, and 20 testing classes. ... Following Triantafillou et al. (2020), the number of classes and support/query sets are sampled uniformly at random regarding the dataset specifications. (See Appendix B.1 for details of the sampling algorithm).
Hardware Specification Yes The offline component of our system is built on top of Py Torch (version 1.10) and runs on a Linux server equipped with an Intel Xeon Gold 5218 CPU and NVIDIA Quadro RTX 8000 GPU. This component is used to obtain the pre-trained model weights, i.e. pre-training and meta-training. Then, the online component of our system is implemented and evaluated on Raspberry Pi Zero 2 and NVIDIA Jetson Nano, which constitute widely used and representative embedded platforms. Pi Zero 2 is equipped with a quad-core 64-bit ARM Cortex-A53 and limited 512 MB RAM. Jetson Nano has a quad-core ARM Cortex-A57 processor with 4 GB of RAM.
Software Dependencies Yes The offline component of our system is built on top of Py Torch (version 1.10) and runs on a Linux server equipped with an Intel Xeon Gold 5218 CPU and NVIDIA Quadro RTX 8000 GPU.
Experiment Setup Yes We adopt a common training strategy to meta-train the pre-trained DNN backbones... Specifically, we meta-train the backbone for 100 epochs. Each epoch has 2000 episodes/tasks. A warm-up and learning rate scheduling with cosine annealing are used. The learning rate increases from 10 6 to 5 10 5 in 5 epochs. Then, it decreases to 10 6. We use SGD with momentum as an optimiser. ... We employed the ADAM optimiser during meta-testing as it achieves the highest accuracy compared to other optimiser types. ... Note that a batch size of 100 is used for these two baselines as their accuracy degrades catastrophically with smaller batch sizes. Conversely, the other methods, including Last Layer, Sparse Update, and Tiny Train, use a batch size of 1 and yield a smaller memory footprint and computational cost.