reproducibilityindex.ai

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory

Authors: Yinan Liang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on Image Net for image classification with 320KB memory on STM32F746 microcontroller. Code is available at https://github.com/liangyn22/MCUFormer.
Researcher Affiliation	Academia	{1Shenzhen International Graduate School, 2Department of Automation}, Tsinghua University
Pseudocode	No	We detail the pseudo algorithm of integer-only square root operation in the supplementary material.
Open Source Code	Yes	Code is available at https://github.com/liangyn22/MCUFormer.
Open Datasets	Yes	We conduct the experiments on Image Net for image classification, which contains 1.2 million training images and 50k validation images from 1000 classes. All images are scaled and biased into the range [-1,1] for normalization. For the training process, we resize the images with the shorter side as 256 and randomly crop a 240 240 region.
Dataset Splits	Yes	We conduct the experiments on Image Net for image classification, which contains 1.2 million training images and 50k validation images from 1000 classes.
Hardware Specification	Yes	We deploy the vision transformers with our hardware-algorithm cooptimizations framework in different microcontrollers with various resource constraint including STM32F427 (Cortex-M4/256KB memory/1MB flash), STM32F746 (Cortex-M7/320KB memory/1MB flash) and STM32H743 (Cortex-M7/512KB memory/2MB flash).
Software Dependencies	No	The paper mentions software frameworks like Tensor Flow Lite Micro, CMSIS-NN, CMix-NN, Micro TVM, and Tiny Engine, but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	For the network architecture search of vision transformers, our choices for the search space consisting of low-rank decomposition ratio r and the token size c can be selected from r [0.4 : 0.05 : 0.95] and c {16, 20, 24, 28, 32}. ... For operator library construction, we utilize int8 quantization for all tensors in the vision transformer during inference. The filter size of the decomposed patch embedding layer is set to 4 4 with multiple forward passes to reduce the peak memory, and we iterate the surrogate assignment from the fixed-point iterative methods for 4 times to calculate the square root in the layer normalization operators.