Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data
Authors: Dachao Lin, Ruoyu Sun, Zhihua Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also characterize our findings in experiments. In this section we conduct experiments to verify our theoretical analyses. |
| Researcher Affiliation | Academia | Dachao Lin1 Ruoyu Sun2 Zhihua Zhang3 1Academy for Advanced Interdisciplinary Studies, Peking University 2Department of Industrial and Enterprise Engineering, Coordinate Science Lab (affiliated) University of Illinois Urbana-Champaign 3School of Mathematical Sciences, Peking University |
| Pseudocode | No | The paper describes methods mathematically and in text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | We construct simple dataset with x U(S1) and y(x) = sgn(v x) with v = (0, 1) . This describes a synthetic data generation process, not a publicly available dataset with concrete access details like a URL, DOI, or repository. |
| Dataset Splits | No | The paper uses a synthetically generated dataset and discusses training with SGD, but it does not specify explicit train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'stochastic gradient descent (SGD)' but does not list any specific software libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would allow replication. |
| Experiment Setup | Yes | We use common stochastic gradient descent (SGD) with the batch size 1000 and the constant small learning rate 10 3. Moreover, we choose an initial value w(0) = we(0) = (0.6, 0.8) . In the deep linear network, we set WN(0) = u N, Wi(0) = ui+1u i with ui = 1, i = 2, . . . , N and u1 = we(0) to satisfy the balancedness conditions Eq. (13). |