Winograd Algorithm for AdderNet
Authors: Wenshuo Li, Hanting Chen, Mingqiang Huang, Xinghao Chen, Chunjing Xu, Yunhe Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both FPGA and benchmarks show that the new method can further reduce the energy consumption without affecting the accuracy of the original Adder Net. |
| Researcher Affiliation | Collaboration | 1Noah s Ark Lab, Huawei Technologies. 2Peking University. 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. |
| Pseudocode | No | The paper provides mathematical formulations and derivations, but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | Here we conduct experiments to show the effectiveness of our proposed Winograd algorithm for Adder Net. The experiments are done on several commonly used datasets, including MNIST, CIFAR and Image Net. For all experiments, we use the transform matrix A0 and G0, and other Ai and Gi matrixes can achieve the similar results. |
| Dataset Splits | No | The paper mentions training on CIFAR and ImageNet datasets and refers to settings from (He et al., 2016), which implies using standard splits. However, it does not explicitly state the specific train/validation/test dataset splits with percentages, absolute counts, or direct citations for the splits themselves in the provided text. |
| Hardware Specification | Yes | All experiments are made via Py Torch on NVIDIA Tesla V100 GPU. To evaluation the energy efficiency of our method in the runtime, we implement the Winograd algorithm for Adder Net and original Adder Net on FPGA. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide its version number or any other software dependencies with specific version information. |
| Experiment Setup | Yes | The learning rate is set to 0.1 at the beginning and decay with the cosine function in the following 100 epochs. We use SGD optimizer with momentum as 0.9, and the batch size is set as 256. The hyper-parameter η in Equation (5) is set to 0.1. The total training epochs are 150. The weight decay is set as 0.0001 and the momentum is 0.9. The hyper-parameter η is set to 0.05 for Winograd Adder Net. |