On feature learning in neural networks with global convergence guarantees

Authors: Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart. and 4 NUMERICAL EXPERIMENTS
Researcher Affiliation Academia Zhengdao Chen Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA zc1216@nyu.edu Eric Vanden-Eijnden Courant Institute of Mathematical Sciences New York University New York, NY 10012, USA eve2@cims.nyu.edu Joan Bruna Courant Institute of Mathematical Sciences and Center for Data Science New York University New York, NY 10012, USA bruna@cims.nyu.edu
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No The data set is inspired by [69]: We sample both the training and the test set i.i.d. from the distribution (x, y) D on Rd+1, under which the joint distribution of (x1, x2, y) is P(x1 = 1, x2 = 0, y = 1) =1 P(x1 = 1, x2 = 0, y = 1) =1 P(x1 = 0, x2 = 1, y = 1) =1 P(x1 = 0, x2 = 1, y = 1) =1 and x3, ..., xd each follow the uniform distribution in [ 1, 1], independently from each other as well as x1, x2 and y. While inspired by [69], they describe a generative process rather than providing access to a specific pre-existing public dataset.
Dataset Splits No The paper mentions training and test sets but does not specify a validation set or explicit train/test/validation dataset splits.
Hardware Specification Yes The experiments are run with NVIDIA GPUs (1080ti and Titan RTX).
Software Dependencies No The paper does not provide specific version numbers for software dependencies used in their experiments.
Experiment Setup Yes We choose to train the models using 50000 steps of (full-batch) GD with step size δ = 1. and We choose σ to be tanh. and For each choice of n, we run the experiment with 5 different random seeds...