A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Authors: Kaixin Gao, Xiaolei Liu, Zhenghai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu7519-7527

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures. In this section, we evaluate TKFAC s performance on the auto-encoder and image classification tasks. Our experiments mainly consist of two parts.
Researcher Affiliation Collaboration Kaixin Gao1 , Xiaolei Liu1 , Zhenghai Huang1 , Min Wang2, Zidong Wang2, Dachuan Xu3 , Fan Yu2 1 School of Mathematics, Tianjin University, China 2 Central Software Institute, Huawei Technologies Co. Ltd, China 3 Department of Operations Research and Information Engineering, Beijing University of Technology, China
Pseudocode Yes Algorithm 1 gives a high level pseudocode of TKFAC nor.
Open Source Code No No explicit statement of open-source code release or a direct link to the paper's methodology implementation was found. The provided link (https://www.mindspore.cn/) refers to a deep learning computing framework, and the paper states that applying TKFAC on it is "left for future work."
Open Datasets Yes Throughout this paper, we use three different datasets, MNIST (Lecun and Bottou 1998), CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009).
Dataset Splits No The paper mentions using specific datasets (MNIST, CIFAR-10, CIFAR-100) and describes general training parameters like epochs and batch size, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification Yes All experiments are run on a single RTX 2080Ti GPU using Tensor Flow.
Software Dependencies No The paper mentions using 'Tensor Flow' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes The hyperparameters including the initial learning rate α, the damping parameter λ and the parameter ν are tuned using a grid search with values α {1e-4, 3e-4, . . . , 1, 3}, λ {1e-8, 1e-6, 1e-4, 3e-4, 1e-3, . . . , 1e-1, 3e-1} and ν {1e-4, 1e-3, . . . , 10}. The moving average parameter ε and the momentum are set to 0.95 and 0.9, respectively. The update intervals are set to TFIM = TINV = 100. All experiments are run 200 epochs and repeated five times with a batch size of 500 for MNIST and 128 for CIFAR-10/100.