reproducibilityindex.ai

Learning ReLUs via Gradient Descent

Authors: Mahdi Soltanolkotabi

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we carry out a simple numerical experiment to corroborate our theoretical results.
Researcher Affiliation	Academia	Mahdi Soltanolkotabi Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA soltanol@usc.edu
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available.
Open Datasets	No	The paper states: "For this purpose we generate a unit norm sparse vector w Rd of dimension d = 1000 containing s = d/50 non-zero entries. We also generate a random feature matrix X Rn d with n = 8slog(d/s) and containing i.i.d. N(0,1) entries." This indicates data was generated for the experiment, not from a publicly available source.
Dataset Splits	No	The paper describes generating synthetic data and running experiments, but does not specify details regarding training, validation, or testing dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers.
Experiment Setup	Yes	For this purpose we generate a unit norm sparse vector w Rd of dimension d = 1000 containing s = d/50 non-zero entries. We also generate a random feature matrix X Rn d with n = 8slog(d/s) and containing i.i.d. N(0,1) entries. We apply the projected gradient iterations to both observation models starting from w0 = 0. For the Re LU observations we use the step size discussed in Theorem 3.1. For the linear model we apply projected gradient descent updates of the form wτ+1 = PK (wτ 1/n XT (Xwτ y)). In both cases we use the regularizer R(w) = w ℓ0 so that the projection only keeps the top s entries of the vector (a.k.a. iterative hard thresholding).