Memory-efficient Patch-based Inference for Tiny Deep Learning

Authors: Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compared MCUNet V2 with existing state-of-the-art solutions on Image Net classification under two hardware settings: 256k B SRAM/1MB Flash and 512k B SRAM/2MB Flash. The former represents a widely used Cortex-M4 microcontroller; the latter corresponds to a higher-end Cortex-M7. The goal is to achieve the highest Image Net accuracy on resource-constrained MCUs (Table 2).
Researcher Affiliation Collaboration Ji Lin1 Wei-Ming Chen1 Han Cai1 Chuang Gan2 Song Han1 1MIT 2MIT-IBM Watson AI Lab
Pseudocode No We provide the details and pseudo code in supplementary.
Open Source Code No The paper mentions a project website "https://mcunet.mit.edu" and refers to extending "Tiny Engine [27]" but does not explicitly provide a direct link to the source code for MCUNet V2's methodology or state that the code is publicly released for this paper's work.
Open Datasets Yes We analyze the advantage of our method on image classification datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reflect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to fit larger resolution on the MCU.
Dataset Splits Yes We analyze the advantage of our method on image classification datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reflect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to fit larger resolution on the MCU. We follow [27] for super network training and evolutionary search, detailed in the supplementary.
Hardware Specification Yes We benchmark the models on 3 MCU models with different hardware resources: STM32F412 (Cortex-M4, 256k B SRAM/1MB Flash), STM32F746 (Cortex-M7, 320k B SRAM/1MB Flash), STM32H743 (Cortex-M7, 512k B SRAM/2MB Flash). The memory usage is measured in int8. (footnote) measured with official Py Torch code (MIT License) using a batch size of 64 on NVIDIA Titan RTX.
Software Dependencies No The paper mentions using "Tensor Flow Lite Micro [1], Tiny Engine [27], micro TVM [8]" and "Py Torch" (in a footnote) but does not provide specific version numbers for these software components.
Experiment Setup No The paper mentions "Models are quantized to int8 for deployment" and that they "follow [27] for super network training and evolutionary search, detailed in the supplementary." However, it does not provide specific hyperparameters like learning rate, batch size, or number of epochs within the main text.