TREC: Transient Redundancy Elimination-based Convolution
Authors: Jiawei Guan, Feng Zhang, Jiesong Liu, Hsin-Hsuan Sung, Ruofan Wu, Xiaoyong Du, Xipeng Shen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TREC on diverse popular CNNs, Cifar Net [1], Squeeze Net (with and without complex bypass) [14], Zf Net [31], and Res Net-34[12] on a microcontroller (MCU)... Experiments show that by removing 96.25% transient redundancy, TREC achieves an average of 4.40 speedup compared to the conventional convolution operator. When applied to the full neural networks, TREC achieves an average of 3.51 speedup with virtually no accuracy loss. |
| Researcher Affiliation | Academia | Jiawei Guan , Feng Zhang , Jiesong Liu , Hsin-Hsuan Sung , Ruofan Wu , Xiaoyong Du , Xipeng Shen Key Laboratory of Data Engineering and Knowledge Engineering (MOE), and School of Information, Renmin University of China Computer Science Department, North Carolina State University guanjw@ruc.edu.cn, fengzhang@ruc.edu.cn, liujiesong@ruc.edu.cn, hsung2@ncsu.edu, ruofanwu@ruc.edu.cn, duyong@ruc.edu.cn, xshen5@ncsu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to its own source code for the methodology described. |
| Open Datasets | Yes | All trainings and inferences are performed on CIFAR-10, while for Res Net, we use the downsampled Image Net [3] with 64 64 resolution |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet datasets but does not explicitly provide specific training/validation/test split percentages, sample counts, or a clear methodology for partitioning the data. |
| Hardware Specification | Yes | Specifically, all inferences are performed on an STM32F469NI MCU with 324KB SRAM and 2MB Flash, using the CMSIS-NN kernel optimized for Arm Cortex-M devices [20]. All trainings are performed using Py Torch 1.10.1 (open-source software with a BSD license) on a machine equipped with a 20-core 3.60GHz Intel Core i7-12700K processor, 128GB of RAM, and an NVIDIA Ge Force RTX A6000 GPU with 48 GB memory. |
| Software Dependencies | Yes | All trainings are performed using Py Torch 1.10.1 (open-source software with a BSD license) |
| Experiment Setup | Yes | The learning rate starts from 0.001 and decreases by 0.1 every 15 epochs. The batch size, weight decay, and momentum are set to 256, 0.9, and 10 4, respectively, and the maximal epoch is set to 100. |