Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations. Furthermore, we empirically compare the performance of (GLM-tron) and (SGD) for ReLU regression with symmetric Bernoulli data. Simulation results are presented in Figure 1. In the well-specified setting, Figures 1(a) and 1(b) show that the excess risk of (GLM-tron) is no worse than that of (SGD), even when both algorithms are tuned with their hyperparameters (initial stepsizes) respectively. This verifies our Theorem 6.2. In the noiseless setting, Figure 1(c) clearly illustrates that (SGD) can converge to a critical point with constant risk, while (GLM-tron) successfully recovers the true parameters w . This verifies our Theorem 6.3. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Johns Hopkins University 2Department of Computer Science, The University of Hong Kong 3Department of Computer Science, University of California, Los Angeles 4Department of Computer Science, Rice University 5Department of Computer Science and Department of Statistics, Harvard University. |
| Pseudocode | No | The paper describes the SGD and GLM-tron update rules with mathematical equations but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to code repositories for the described methodology. |
| Open Datasets | No | The paper uses synthetic data models like "symmetric Bernoulli distribution" and "Gaussian Distribution" for its simulations, specifying how data is generated (e.g., "P{x = ei} = P{x = -ei} = λi/2"). It does not use or provide access to a pre-existing public dataset. |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits. It discusses using N i.i.d. samples for algorithms and evaluating excess risk. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (CPU, GPU models, memory, etc.) used to run the experiments or simulations. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For each algorithm and each sample size, we do a grid search on the initial stepsize γ0 {0.5, 0.25, 0.1, 0.075, 0.05, 0.025, 0.01} and report the best excess risk. The plots are averaged over 20 independent runs. |