reproducibilityindex.ai

THOR, Trace-based Hardware-driven Layer-Oriented Natural Gradient Descent Computation

Authors: Mengyun Chen, Kaixin Gao, Xiaolei Liu, Zidong Wang, Ningxi Ni, Qian Zhang, Lei Chen, Chao Ding, Zhenghai Huang, Min Wang, Shuangling Wang, Fan Yu, Xinyuan Zhao, Dachuan Xu7046-7054

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of THOR, we have conducted extensive experiments. The results show that training Res Net-50 on Image Net with THOR only takes 66.7 minutes to achieve a top-1 accuracy of 75.9 % under an 8 Ascend 910 environment with Mind Spore, a new deep learning computing framework.
Researcher Affiliation	Collaboration	1Huawei Technologies Co. Ltd 2Tianjin University 3Beijing University of Technology 4Hong Kong University of Science and Technology 5Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1 THOR
Open Source Code	Yes	Furthermore, part of our algorithm has been open sourced 1, and the code will continue to be improved in the future. 1THOR: https://gitee.com/mindspore/mindspore/tree/master/ model zoo/ofﬁcial/cv/resnet thor.
Open Datasets	Yes	To test the performance, we apply THOR to train Res Net-18 for CIFAR-10 and Res Net-50 for Image Net.
Dataset Splits	No	The paper states that ResNet-18 is trained on CIFAR-10 and ResNet-50 on ImageNet but does not explicitly specify the training/validation/test dataset splits using percentages, counts, or specific references to pre-defined validation splits.
Hardware Specification	Yes	The results show that training Res Net-50 on Image Net with THOR only takes 66.7 minutes to achieve a top-1 accuracy of 75.9 % under an 8 Ascend 910 environment with Mind Spore... In this experiment, we use pytorch on 1 Tesla v100... we implement THOR on Mind Spore with 8 Ascend 910
Software Dependencies	No	The paper mentions using 'Mind Spore' and 'pytorch' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	In this experiment, we use pytorch on 1 Tesla v100 and train Res Net-18 on CIFAR-10 with batch-size 128. And we set the same learning rate for Momentum, KFAC, THOR, THOR stop and THOR NT and same damping for KFAC, THOR, THOR stop and THOR NT. The learning rate α(e) for e epoch and the damping λ(e) are deﬁned as follows: α(e) = 0.1 * 10^(floor(e/70)) λ(e) = 0.3 * 10^(floor(e/70)). The weight decay for Momentum, KFAC, THOR, THOR stop and THOR NT is set to 0.0005. The trace thresholds are set to (ω1, ω2) = (0.01, 0) for THOR, (ω1, ω2) = (0.01, 0.001) for THOR stop and (ω1, ω2) = (0, 0) for THOR NT. The update interval for KFAC is set to 20. ... In this experiment, we implement THOR on Mind Spore with 8 Ascend 910 and train Res Net-50 on Image Net with batch-size 256. The weight decay for these methods is set to 0.0005 and the label smoothing is set to 0.1. The trace thresholds are set to (ω1, ω2) = (0.01, 0) for THOR, (ω1, ω2) = (0.01, 0.001) for THOR stop and (ω1, ω2) = (0, 0) for THOR NT. Split dimension, learning rate, damping and update interval can be found in Figure 8. The learning rate α(e) for e epoch is determined as follows: α(e) = αtarget * (1 - e/eend)^pdecay. The damping λ adopts the following decreasing rule: λ(e) = λ(0) * ρ^(e/10)^decay. The hyper-parameters for our methods are shown in Table 3.