reproducibilityindex.ai

Memory-efficient Patch-based Inference for Tiny Deep Learning

Authors: Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compared MCUNet V2 with existing state-of-the-art solutions on Image Net classiﬁcation under two hardware settings: 256k B SRAM/1MB Flash and 512k B SRAM/2MB Flash. The former represents a widely used Cortex-M4 microcontroller; the latter corresponds to a higher-end Cortex-M7. The goal is to achieve the highest Image Net accuracy on resource-constrained MCUs (Table 2).
Researcher Affiliation	Collaboration	Ji Lin1 Wei-Ming Chen1 Han Cai1 Chuang Gan2 Song Han1 1MIT 2MIT-IBM Watson AI Lab
Pseudocode	No	We provide the details and pseudo code in supplementary.
Open Source Code	No	The paper mentions a project website "https://mcunet.mit.edu" and refers to extending "Tiny Engine [27]" but does not explicitly provide a direct link to the source code for MCUNet V2's methodology or state that the code is publicly released for this paper's work.
Open Datasets	Yes	We analyze the advantage of our method on image classiﬁcation datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reﬂect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to ﬁt larger resolution on the MCU.
Dataset Splits	Yes	We analyze the advantage of our method on image classiﬁcation datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reﬂect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to ﬁt larger resolution on the MCU. We follow [27] for super network training and evolutionary search, detailed in the supplementary.
Hardware Specification	Yes	We benchmark the models on 3 MCU models with different hardware resources: STM32F412 (Cortex-M4, 256k B SRAM/1MB Flash), STM32F746 (Cortex-M7, 320k B SRAM/1MB Flash), STM32H743 (Cortex-M7, 512k B SRAM/2MB Flash). The memory usage is measured in int8. (footnote) measured with ofﬁcial Py Torch code (MIT License) using a batch size of 64 on NVIDIA Titan RTX.
Software Dependencies	No	The paper mentions using "Tensor Flow Lite Micro [1], Tiny Engine [27], micro TVM [8]" and "Py Torch" (in a footnote) but does not provide specific version numbers for these software components.
Experiment Setup	No	The paper mentions "Models are quantized to int8 for deployment" and that they "follow [27] for super network training and evolutionary search, detailed in the supplementary." However, it does not provide specific hyperparameters like learning rate, batch size, or number of epochs within the main text.