Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Authors: Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are provided to support our theory. |
| Researcher Affiliation | Academia | Yuqing Wang, Minshuo Chen, Tuo Zhao, Molei Tao Georgia Institute of Technology {ywang3398,mchen393,tourzhao,mtao}@gatech.edu |
| Pseudocode | No | The paper describes mathematical update rules for Gradient Descent but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating elements for matrix A from a Gaussian distribution and generating initial conditions (X0, Y0) randomly. It does not mention using any publicly available or open datasets with concrete access information. |
| Dataset Splits | No | The paper does not provide specific details about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions Gradient Descent (GD) but does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The initial conditions are randomly generated, respectively with ( x0 , y0 ) = (9, 1), ( x0 , y0 ) = (19, 1), and ( x0 , y0 ) = (99, 1); the learning rates are chosen within the range of Theorem 3.1 from large to small as h0, 6 7h0 for the 1st-6th columns respectively where h0 = 4/( x0 2 + y0 2 + 8). |