TREC: Transient Redundancy Elimination-based Convolution

Authors: Jiawei Guan, Feng Zhang, Jiesong Liu, Hsin-Hsuan Sung, Ruofan Wu, Xiaoyong Du, Xipeng Shen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TREC on diverse popular CNNs, Cifar Net [1], Squeeze Net (with and without complex bypass) [14], Zf Net [31], and Res Net-34[12] on a microcontroller (MCU)... Experiments show that by removing 96.25% transient redundancy, TREC achieves an average of 4.40 speedup compared to the conventional convolution operator. When applied to the full neural networks, TREC achieves an average of 3.51 speedup with virtually no accuracy loss.
Researcher Affiliation Academia Jiawei Guan , Feng Zhang , Jiesong Liu , Hsin-Hsuan Sung , Ruofan Wu , Xiaoyong Du , Xipeng Shen Key Laboratory of Data Engineering and Knowledge Engineering (MOE), and School of Information, Renmin University of China Computer Science Department, North Carolina State University guanjw@ruc.edu.cn, fengzhang@ruc.edu.cn, liujiesong@ruc.edu.cn, hsung2@ncsu.edu, ruofanwu@ruc.edu.cn, duyong@ruc.edu.cn, xshen5@ncsu.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes All trainings and inferences are performed on CIFAR-10, while for Res Net, we use the downsampled Image Net [3] with 64 64 resolution
Dataset Splits No The paper mentions using CIFAR-10 and ImageNet datasets but does not explicitly provide specific training/validation/test split percentages, sample counts, or a clear methodology for partitioning the data.
Hardware Specification Yes Specifically, all inferences are performed on an STM32F469NI MCU with 324KB SRAM and 2MB Flash, using the CMSIS-NN kernel optimized for Arm Cortex-M devices [20]. All trainings are performed using Py Torch 1.10.1 (open-source software with a BSD license) on a machine equipped with a 20-core 3.60GHz Intel Core i7-12700K processor, 128GB of RAM, and an NVIDIA Ge Force RTX A6000 GPU with 48 GB memory.
Software Dependencies Yes All trainings are performed using Py Torch 1.10.1 (open-source software with a BSD license)
Experiment Setup Yes The learning rate starts from 0.001 and decreases by 0.1 every 15 epochs. The batch size, weight decay, and momentum are set to 256, 0.9, and 10 4, respectively, and the maximal epoch is set to 100.