On feature learning in neural networks with global convergence guarantees
Authors: Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart. and 4 NUMERICAL EXPERIMENTS |
| Researcher Affiliation | Academia | Zhengdao Chen Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA zc1216@nyu.edu Eric Vanden-Eijnden Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA eve2@cims.nyu.edu Joan Bruna Courant Institute of Mathematical Sciences and Center for Data Science New York University New York, NY 10012, USA bruna@cims.nyu.edu |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The data set is inspired by [69]: We sample both the training and the test set i.i.d. from the distribution (x, y) D on Rd+1, under which the joint distribution of (x1, x2, y) is P(x1 = 1, x2 = 0, y = 1) =1 P(x1 = 1, x2 = 0, y = 1) =1 P(x1 = 0, x2 = 1, y = 1) =1 P(x1 = 0, x2 = 1, y = 1) =1 and x3, ..., xd each follow the uniform distribution in [ 1, 1], independently from each other as well as x1, x2 and y. While inspired by [69], they describe a generative process rather than providing access to a specific pre-existing public dataset. |
| Dataset Splits | No | The paper mentions training and test sets but does not specify a validation set or explicit train/test/validation dataset splits. |
| Hardware Specification | Yes | The experiments are run with NVIDIA GPUs (1080ti and Titan RTX). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies used in their experiments. |
| Experiment Setup | Yes | We choose to train the models using 50000 steps of (full-batch) GD with step size δ = 1. and We choose σ to be tanh. and For each choice of n, we run the experiment with 5 different random seeds... |