Tractable structured natural-gradient descent using local parameterizations

Authors: Wu Lin, Frank Nielsen, Khan Mohammad Emtiyaz, Mark Schmidt

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show results on a range of problems from deep learning, variational inference, and evolution strategies. We show applications to various problems for search, variational inference, and deep learning, obtaining much faster convergence than methods that ignore geometry. An example for 1-D logistic regression is shown in 1(III). Overall, our work opens a new direction to design efficient and structured geometric methods via local parameterizations. Numerical Results: We present results on problems involving search, inference, optimization, and deep learning, where Table 1 in Appx. A summarizes our updates.
Researcher Affiliation Collaboration 1University of British Columbia. 2Sony Computer Science Laboratories Inc. 3RIKEN Center for Advanced Intelligence Project. 4CIFAR AI Chair, Alberta Machine Intelligence Institute.
Pseudocode No The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We randomly generate xi and Q with d = 50, Ntrain = 125,000 for training and Ntest = 25,000 for testing. We consider a case with K = 40, C = 10, d = 80, s = 20. We train the model with our updates derived from matrix Gaussian (see Appx. I) for each layer-wise matrix weight on datasets CIFAR-10 , STL-10 .
Dataset Splits No The paper specifies Ntrain = 125,000 for training and Ntest = 25,000 for testing in one experiment, but it does not provide explicit details about validation splits for any of the reported experiments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper describes the methods and results but does not specify any software dependencies with version numbers (e.g., specific Python library versions, TensorFlow/PyTorch versions).
Experiment Setup Yes All methods are trained using mini-batches, where the size of mini-batch is 100. We consider a case with K = 40, C = 10, d = 80, s = 20. We set d = 200 and γ = 1 in (1). For CIFAR-10 and STL-10 , we train the model with mini-batch size 20. We employ L2 regularization with weight 10 2. We use the same initialization and hyper-parameters in all methods. We report results in terms of test accuracy, where we average the results over 5 runs with distinct random seeds.