Learning single-index models with shallow neural networks
Authors: Alberto Bietti, Joan Bruna, Clayton Sanford, Min Jae Song
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our theoretical results with experiments in Section A. |
| Researcher Affiliation | Academia | Alberto Bietti New York University Joan Bruna New York University Clayton Sanford Columbia University Min Jae Song New York University |
| Pseudocode | Yes | The overall approach is described in Procedure 1. ... Procedure 1 Gradient Flow ... Procedure 2 Fine-Tuning |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Supplemental zip file. |
| Open Datasets | No | We focus on regression problems under a single-index model with Gaussian input data. Specifically, we assume d-dimensional inputs x γd := N(0, Id), and labels y = F (x) + = f (h , xi) + , where 2 Sd 1 and N(0, σ2) is an independent, additive Gaussian noise. |
| Dataset Splits | No | The paper discusses the theoretical sample size 'n' and uses synthetic data, but does not specify explicit training, validation, or test dataset splits needed for reproduction. |
| Hardware Specification | Yes | We ran all experiments on an Intel Core i9-9900K CPU @ 3.60GHz with 64GB RAM and a NVIDIA GeForce RTX 2080 Ti. |
| Software Dependencies | No | We generate synthetic data as specified in Section 3 and train our models using PyTorch. (No specific version numbers are provided for PyTorch or other software.) |
| Experiment Setup | Yes | The network has N = 1000 hidden units with biases drawn from N(0, 2) with = 1.2. We use the Adam optimizer with learning rate = 0.01, and default PyTorch parameters for momentum. We train for 100 epochs, with batch size 100. For the experiments in Figure 1, we set the dimension d = 50. For Figure 2, we set d = 100, N = 2000, batch size = 200. |