Learning ReLUs via Gradient Descent
Authors: Mahdi Soltanolkotabi
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we carry out a simple numerical experiment to corroborate our theoretical results. |
| Researcher Affiliation | Academia | Mahdi Soltanolkotabi Ming Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA soltanol@usc.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available. |
| Open Datasets | No | The paper states: "For this purpose we generate a unit norm sparse vector w Rd of dimension d = 1000 containing s = d/50 non-zero entries. We also generate a random feature matrix X Rn d with n = 8slog(d/s) and containing i.i.d. N(0,1) entries." This indicates data was generated for the experiment, not from a publicly available source. |
| Dataset Splits | No | The paper describes generating synthetic data and running experiments, but does not specify details regarding training, validation, or testing dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers. |
| Experiment Setup | Yes | For this purpose we generate a unit norm sparse vector w Rd of dimension d = 1000 containing s = d/50 non-zero entries. We also generate a random feature matrix X Rn d with n = 8slog(d/s) and containing i.i.d. N(0,1) entries. We apply the projected gradient iterations to both observation models starting from w0 = 0. For the Re LU observations we use the step size discussed in Theorem 3.1. For the linear model we apply projected gradient descent updates of the form wτ+1 = PK (wτ 1/n XT (Xwτ y)). In both cases we use the regularizer R(w) = w ℓ0 so that the projection only keeps the top s entries of the vector (a.k.a. iterative hard thresholding). |