reproducibilityindex.ai

Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks

Authors: Guodong Zhang, James Martens, Roger B. Grosse

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we analyze for the ﬁrst time the speed of convergence of natural gradient descent on nonlinear neural networks with squared-error loss. We identify two conditions which guarantee efﬁcient convergence from random initializations: (1) the Jacobian matrix (of network s output for all training cases with respect to the parameters) has full row rank, and (2) the Jacobian matrix is stable for small perturbations around the initialization. For two-layer Re LU neural networks, we prove that these two conditions do in fact hold throughout the training, under the assumptions of nondegenerate inputs and overparameterization.
Researcher Affiliation	Collaboration	Guodong Zhang1,2, James Martens3, Roger Grosse1,2 University of Toronto1, Vector Institute2, Deep Mind3 {gdzhang, rgrosse}@cs.toronto.edu, jamesmartens@google.com
Pseudocode	No	No pseudocode or algorithm block was found.
Open Source Code	No	The paper does not provide an unambiguous statement or link to open-source code for the methodology described.
Open Datasets	No	Figure 1: Visualization of natural gradient update and gradient descent update in the output space (for a randomly initialized network). We take two classes (4 and 9) from MNIST [Le Cun et al., 1998] and generate the targets (denoted as star in the ﬁgure) by f(x) = x 0.5 + 0.3 N(0, I) where x 2 R2 is one-hot target. This is for an illustrative visualization, not a primary experimental setup, and no direct access information (link, DOI) is provided for the data used in this visualization.
Dataset Splits	No	The paper focuses on theoretical analysis and proofs, not empirical experiments with specified training, validation, and test dataset splits.
Hardware Specification	No	No specific hardware (e.g., GPU models, CPU types, memory) used for any computations or visualizations is mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x) are mentioned in the paper.
Experiment Setup	No	The paper is theoretical and focuses on convergence analysis. While it discusses properties like 'step size = O(1)', these are theoretical bounds rather than concrete hyperparameters for an empirical experiment. No specific details about training configurations or system-level settings for experimental reproduction are provided.