Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On-Device Training Under 256KB Memory
Authors: Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. ... Our framework is the first practical solution for on-device transfer learning of visual recognition on tiny IoT devices (e.g., a microcontroller with only 256KB SRAM), using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy. ... 3 Experiments. We used three popular tiny ML models in our experiments: MobileNet V2... Proxyless NAS... MCUNet... We perform the training and memory/latency measurement on a microcontroller STM32F746 (320KB SRAM, 1MB Flash) using a single batch size. |
| Researcher Affiliation | Collaboration | Ji Lin1 Ligeng Zhu1 Wei-Ming Chen1 Wei-Chen Wang1 Chuang Gan2 Song Han1 1MIT 2MIT-IBM Watson AI Lab |
| Pseudocode | No | The paper describes its algorithms and system design with figures and textual explanations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | A video demo can be found here. (referring to https://tinyml.mit.edu/on-device-training). There is no explicit statement or link confirming the release of the source code for the methodology described in the paper. |
| Open Datasets | Yes | We measure the transfer learning accuracy on multiple downstream datasets and report the average accuracy [37]. We follow [12] to use a set of vision datasets including Cars [39], CIFAR10 [40], CIFAR-100 [40], CUB [67], Flowers [54], Food [9], and Pets [55]. We also include VWW dataset [20], a widely used benchmark for tiny ML applications. |
| Dataset Splits | Yes | We fine-tuned the models on all these datasets for 50 epochs following [12]. We train on VWW for 10 epochs following [47]. ... Figure 8. Training and validation loss curves w/ and w/o QAS. |
| Hardware Specification | Yes | We perform the training and memory/latency measurement on a microcontroller STM32F746 (320KB SRAM, 1MB Flash) using a single batch size. ... We deployed our training system to a Cortex M7 microcontroller STM32F746 to demonstrate the feasibility |
| Software Dependencies | No | The paper mentions software like PyTorch [56] and TensorFlow [4] for comparison, and TF-Lite Micro kernels, but does not provide specific version numbers for any software dependencies used in its own experimental setup. |
| Experiment Setup | Yes | We used three popular tiny ML models in our experiments: MobileNet V2 [60] (width multiplier 0.35...), Proxyless NAS [13] (width multiplier 0.3...), MCUNet [47] (...). We pre-trained the models on ImageNet [22] and perform post-training quantization [34]. The quantized models are fine-tuned on downstream datasets... We perform the training and memory/latency measurement... using a single batch size. ... We fine-tuned the models on all these datasets for 50 epochs following [12]. We train on VWW for 10 epochs following [47]. We used resolution 128 for all datasets and models for a fair comparison. Please refer to the the appendix (Section C) for detailed training hyper-parameters. |