Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Authors: Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations. Furthermore, we empirically compare the performance of (GLM-tron) and (SGD) for ReLU regression with symmetric Bernoulli data. Simulation results are presented in Figure 1. In the well-specified setting, Figures 1(a) and 1(b) show that the excess risk of (GLM-tron) is no worse than that of (SGD), even when both algorithms are tuned with their hyperparameters (initial stepsizes) respectively. This verifies our Theorem 6.2. In the noiseless setting, Figure 1(c) clearly illustrates that (SGD) can converge to a critical point with constant risk, while (GLM-tron) successfully recovers the true parameters w . This verifies our Theorem 6.3.
Researcher Affiliation Academia 1Department of Computer Science, Johns Hopkins University 2Department of Computer Science, The University of Hong Kong 3Department of Computer Science, University of California, Los Angeles 4Department of Computer Science, Rice University 5Department of Computer Science and Department of Statistics, Harvard University.
Pseudocode No The paper describes the SGD and GLM-tron update rules with mathematical equations but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to code repositories for the described methodology.
Open Datasets No The paper uses synthetic data models like "symmetric Bernoulli distribution" and "Gaussian Distribution" for its simulations, specifying how data is generated (e.g., "P{x = ei} = P{x = -ei} = λi/2"). It does not use or provide access to a pre-existing public dataset.
Dataset Splits No The paper does not explicitly mention training, validation, or test dataset splits. It discusses using N i.i.d. samples for algorithms and evaluating excess risk.
Hardware Specification No The paper does not provide any specific details about the hardware (CPU, GPU models, memory, etc.) used to run the experiments or simulations.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes For each algorithm and each sample size, we do a grid search on the initial stepsize γ0 {0.5, 0.25, 0.1, 0.075, 0.05, 0.025, 0.01} and report the best excess risk. The plots are averaged over 20 independent runs.