Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On Non-local Convergence Analysis of Deep Linear Networks
Authors: Kun Chen, Dachao Lin, Zhihua Zhang
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct numerical experiments to verify our findings. Though gradient descent seldom converges to strict saddle points (Lee et al., 2016), we find that our analysis of gradient flow reveals the long stuck period of trajectory under gradient descent in practice and the transition of the convergence rates for trajectories. In this section, we conduct simple numerical experiments to verify our findings. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Peking University. 2Academy for Advanced Interdisciplinary Studies, Peking University. |
| Pseudocode | No | The paper contains mathematical derivations and theoretical analyses but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper mentions "data matrices X" and "target matrix Z" and states "Data: XX = Idx" as an assumption, but it does not refer to any specific publicly available datasets by name (e.g., CIFAR-10, MNIST) nor does it provide links or citations to obtain the data used for numerical experiments. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, counts, or references to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper conducts numerical experiments but does not specify any hardware details such as GPU models, CPU types, or cloud computing resources used. |
| Software Dependencies | No | The paper mentions using "gradient descent (GD)" but does not specify any software dependencies or their version numbers (e.g., libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | We run gradient descent (GD) for the problem (1) with a small learning rate 5e-4, and we artificially set si = d + 1 - i, i [d]. We choose N = 6, d = 5 with hidden-layer width (d N, ..., d0) = (5, 4, 1, 10, 5, 3, 8), and set different k [0 : d] in Theorem 3.1. |