Winograd Algorithm for AdderNet

Authors: Wenshuo Li, Hanting Chen, Mingqiang Huang, Xinghao Chen, Chunjing Xu, Yunhe Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both FPGA and benchmarks show that the new method can further reduce the energy consumption without affecting the accuracy of the original Adder Net.
Researcher Affiliation Collaboration 1Noah s Ark Lab, Huawei Technologies. 2Peking University. 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.
Pseudocode No The paper provides mathematical formulations and derivations, but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes Here we conduct experiments to show the effectiveness of our proposed Winograd algorithm for Adder Net. The experiments are done on several commonly used datasets, including MNIST, CIFAR and Image Net. For all experiments, we use the transform matrix A0 and G0, and other Ai and Gi matrixes can achieve the similar results.
Dataset Splits No The paper mentions training on CIFAR and ImageNet datasets and refers to settings from (He et al., 2016), which implies using standard splits. However, it does not explicitly state the specific train/validation/test dataset splits with percentages, absolute counts, or direct citations for the splits themselves in the provided text.
Hardware Specification Yes All experiments are made via Py Torch on NVIDIA Tesla V100 GPU. To evaluation the energy efficiency of our method in the runtime, we implement the Winograd algorithm for Adder Net and original Adder Net on FPGA.
Software Dependencies No The paper mentions 'Py Torch' but does not provide its version number or any other software dependencies with specific version information.
Experiment Setup Yes The learning rate is set to 0.1 at the beginning and decay with the cosine function in the following 100 epochs. We use SGD optimizer with momentum as 0.9, and the batch size is set as 256. The hyper-parameter η in Equation (5) is set to 0.1. The total training epochs are 150. The weight decay is set as 0.0001 and the momentum is 0.9. The hyper-parameter η is set to 0.05 for Winograd Adder Net.