Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations. Furthermore, we empirically compare the performance of (GLM-tron) and (SGD) for ReLU regression with symmetric Bernoulli data. Simulation results are presented in Figure 1. In the well-specified setting, Figures 1(a) and 1(b) show that the excess risk of (GLM-tron) is no worse than that of (SGD), even when both algorithms are tuned with their hyperparameters (initial stepsizes) respectively. This verifies our Theorem 6.2. In the noiseless setting, Figure 1(c) clearly illustrates that (SGD) can converge to a critical point with constant risk, while (GLM-tron) successfully recovers the true parameters w . This verifies our Theorem 6.3. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Johns Hopkins University 2Department of Computer Science, The University of Hong Kong 3Department of Computer Science, University of California, Los Angeles 4Department of Computer Science, Rice University 5Department of Computer Science and Department of Statistics, Harvard University. |
| Pseudocode | No | The paper describes the SGD and GLM-tron update rules with mathematical equations but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to code repositories for the described methodology. |
| Open Datasets | No | The paper uses synthetic data models like "symmetric Bernoulli distribution" and "Gaussian Distribution" for its simulations, specifying how data is generated (e.g., "P{x = ei} = P{x = -ei} = λi/2"). It does not use or provide access to a pre-existing public dataset. |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits. It discusses using N i.i.d. samples for algorithms and evaluating excess risk. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (CPU, GPU models, memory, etc.) used to run the experiments or simulations. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | For each algorithm and each sample size, we do a grid search on the initial stepsize γ0 {0.5, 0.25, 0.1, 0.075, 0.05, 0.025, 0.01} and report the best excess risk. The plots are averaged over 20 independent runs. |