Memory-efficient Patch-based Inference for Tiny Deep Learning
Authors: Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compared MCUNet V2 with existing state-of-the-art solutions on Image Net classification under two hardware settings: 256k B SRAM/1MB Flash and 512k B SRAM/2MB Flash. The former represents a widely used Cortex-M4 microcontroller; the latter corresponds to a higher-end Cortex-M7. The goal is to achieve the highest Image Net accuracy on resource-constrained MCUs (Table 2). |
| Researcher Affiliation | Collaboration | Ji Lin1 Wei-Ming Chen1 Han Cai1 Chuang Gan2 Song Han1 1MIT 2MIT-IBM Watson AI Lab |
| Pseudocode | No | We provide the details and pseudo code in supplementary. |
| Open Source Code | No | The paper mentions a project website "https://mcunet.mit.edu" and refers to extending "Tiny Engine [27]" but does not explicitly provide a direct link to the source code for MCUNet V2's methodology or state that the code is publicly released for this paper's work. |
| Open Datasets | Yes | We analyze the advantage of our method on image classification datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reflect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to fit larger resolution on the MCU. |
| Dataset Splits | Yes | We analyze the advantage of our method on image classification datasets: Image Net [11] as the standard benchmark, and Visual Wake Words [10] to reflect Tiny ML applications. We further validate our method on object detection datasets: Pascal VOC [13] and WIDER FACE [48] to show our advantage: be able to fit larger resolution on the MCU. We follow [27] for super network training and evolutionary search, detailed in the supplementary. |
| Hardware Specification | Yes | We benchmark the models on 3 MCU models with different hardware resources: STM32F412 (Cortex-M4, 256k B SRAM/1MB Flash), STM32F746 (Cortex-M7, 320k B SRAM/1MB Flash), STM32H743 (Cortex-M7, 512k B SRAM/2MB Flash). The memory usage is measured in int8. (footnote) measured with official Py Torch code (MIT License) using a batch size of 64 on NVIDIA Titan RTX. |
| Software Dependencies | No | The paper mentions using "Tensor Flow Lite Micro [1], Tiny Engine [27], micro TVM [8]" and "Py Torch" (in a footnote) but does not provide specific version numbers for these software components. |
| Experiment Setup | No | The paper mentions "Models are quantized to int8 for deployment" and that they "follow [27] for super network training and evolutionary search, detailed in the supplementary." However, it does not provide specific hyperparameters like learning rate, batch size, or number of epochs within the main text. |