reproducibilityindex.ai

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Authors: Kaixin Gao, Xiaolei Liu, Zhenghai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu7519-7527

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures. In this section, we evaluate TKFAC s performance on the auto-encoder and image classiﬁcation tasks. Our experiments mainly consist of two parts.
Researcher Affiliation	Collaboration	Kaixin Gao1 , Xiaolei Liu1 , Zhenghai Huang1 , Min Wang2, Zidong Wang2, Dachuan Xu3 , Fan Yu2 1 School of Mathematics, Tianjin University, China 2 Central Software Institute, Huawei Technologies Co. Ltd, China 3 Department of Operations Research and Information Engineering, Beijing University of Technology, China
Pseudocode	Yes	Algorithm 1 gives a high level pseudocode of TKFAC nor.
Open Source Code	No	No explicit statement of open-source code release or a direct link to the paper's methodology implementation was found. The provided link (https://www.mindspore.cn/) refers to a deep learning computing framework, and the paper states that applying TKFAC on it is "left for future work."
Open Datasets	Yes	Throughout this paper, we use three different datasets, MNIST (Lecun and Bottou 1998), CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009).
Dataset Splits	No	The paper mentions using specific datasets (MNIST, CIFAR-10, CIFAR-100) and describes general training parameters like epochs and batch size, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification	Yes	All experiments are run on a single RTX 2080Ti GPU using Tensor Flow.
Software Dependencies	No	The paper mentions using 'Tensor Flow' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup	Yes	The hyperparameters including the initial learning rate α, the damping parameter λ and the parameter ν are tuned using a grid search with values α {1e-4, 3e-4, . . . , 1, 3}, λ {1e-8, 1e-6, 1e-4, 3e-4, 1e-3, . . . , 1e-1, 3e-1} and ν {1e-4, 1e-3, . . . , 10}. The moving average parameter ε and the momentum are set to 0.95 and 0.9, respectively. The update intervals are set to TFIM = TINV = 100. All experiments are run 200 epochs and repeated ﬁve times with a batch size of 500 for MNIST and 128 for CIFAR-10/100.