Preconditioner on Matrix Lie Group for SGD
Authors: Xi-Lin Li
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on relatively large scale machine learning problems are reported for performance study. 7 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Industry | Xi-Lin Li GMEMS Technologies and Spectimbre 366 Fairview Way, Milpitas, CA 95035 lixilinx@gmail.com |
| Pseudocode | No | The paper describes the steps for the preconditioned SGD methods in numbered lists, but these are prose descriptions rather than structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have put our Tensor Flow and Pytorch implementations on https://github.com/ lixilinx. |
| Open Datasets | Yes | We consider the Image Net ILSVRC2012 database for the image classification task. The Wikitext-2 database with 33278 tokens is considered. |
| Dataset Splits | No | The paper mentions evaluating on 'validation accuracy' (Section 7.2) and 'perplexity on validation set' (Section 7.3), but it does not specify the exact percentages or counts for the training, validation, and test splits of the datasets. |
| Hardware Specification | Yes | For the Image Net experiment, all compared methods are implemented in Tensorflow, and require two days and a few hours to finish 40 epochs on a Ge Force GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' and 'Pytorch implementations' but does not specify their version numbers or any other software dependencies with versions. |
| Experiment Setup | Yes | Each compared method is trained with 40 epochs, mini-batch size 128, step size µ for the first 20 epochs, and 0.1µ for the last 20 epochs. For Adam, the initial step size is set to 0.00005. For batch normalization, initial step size is 0.002, and its moving average factors for momentum and statistics used for feature normalization are 0.9 and 0.99, respectively. The momentum method uses initial step size 0.002, and moving average factor 0.9 for momentum. Preconditioned SGD performs better with the scaling-and-normalization preconditioner. Its Q is initialized to 0.1I, and updated with normalized step size 0.01. For the Fisher type preconditioner, we set λ = 0.001 and initial step size 0.00005. For the Newton type preconditioner, its initial step size is 0.01. |