A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Authors: Kaixin Gao, Xiaolei Liu, Zhenghai Huang, Min Wang, Zidong Wang, Dachuan Xu, Fan Yu7519-7527
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures. In this section, we evaluate TKFAC s performance on the auto-encoder and image classification tasks. Our experiments mainly consist of two parts. |
| Researcher Affiliation | Collaboration | Kaixin Gao1 , Xiaolei Liu1 , Zhenghai Huang1 , Min Wang2, Zidong Wang2, Dachuan Xu3 , Fan Yu2 1 School of Mathematics, Tianjin University, China 2 Central Software Institute, Huawei Technologies Co. Ltd, China 3 Department of Operations Research and Information Engineering, Beijing University of Technology, China |
| Pseudocode | Yes | Algorithm 1 gives a high level pseudocode of TKFAC nor. |
| Open Source Code | No | No explicit statement of open-source code release or a direct link to the paper's methodology implementation was found. The provided link (https://www.mindspore.cn/) refers to a deep learning computing framework, and the paper states that applying TKFAC on it is "left for future work." |
| Open Datasets | Yes | Throughout this paper, we use three different datasets, MNIST (Lecun and Bottou 1998), CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009). |
| Dataset Splits | No | The paper mentions using specific datasets (MNIST, CIFAR-10, CIFAR-100) and describes general training parameters like epochs and batch size, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | All experiments are run on a single RTX 2080Ti GPU using Tensor Flow. |
| Software Dependencies | No | The paper mentions using 'Tensor Flow' but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | The hyperparameters including the initial learning rate α, the damping parameter λ and the parameter ν are tuned using a grid search with values α {1e-4, 3e-4, . . . , 1, 3}, λ {1e-8, 1e-6, 1e-4, 3e-4, 1e-3, . . . , 1e-1, 3e-1} and ν {1e-4, 1e-3, . . . , 10}. The moving average parameter ε and the momentum are set to 0.95 and 0.9, respectively. The update intervals are set to TFIM = TINV = 100. All experiments are run 200 epochs and repeated five times with a batch size of 500 for MNIST and 128 for CIFAR-10/100. |