reproducibilityindex.ai

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels

Authors: Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We supplement our theoretical results by demonstrating this phenomenon empirically: in CIFAR-10 and MNIST images with various backgrounds, as the background noise increases in intensity, a CNN s performance stays relatively robust, whereas its corresponding neural tangent kernel sees a notable drop in performance.
Researcher Affiliation	Collaboration	Stefani Karp Carnegie Mellon University and Google Research shkarp@cs.cmu.edu Ezra Winston Carnegie Mellon University ewinston@cs.cmu.edu Yuanzhi Li Carnegie Mellon University yuanzhil@cs.cmu.edu Aarti Singh Carnegie Mellon University aarti@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Mini-batch SGD
Open Source Code	Yes	Code for experiments is available at https://github.com/skarp/local-signal-adaptivity.
Open Datasets	Yes	We create new datasets by embedding CIFAR-10 and MNIST images within either random Gaussian or IMAGENET backgrounds. ... CIFAR-10 [Krizhevsky, 2009] ... MNIST [Le Cun et al., 2010] ... IMAGENET backgrounds [Deng et al., 2009].
Dataset Splits	No	The paper does not explicitly provide specific training, validation, or test dataset split percentages or counts in the main text.
Hardware Specification	No	The paper describes the models used (e.g., '10-layer Wide ResNet', 'small CNN') but does not specify the hardware used for training or inference, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software like 'Neural Tangents' and 'JAX' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We initialize b deterministically at 0. We initialize w randomly by drawing from N 0, σ2 0Id d , where σ0 is 1/poly(k). We train the above CNN using mini-batch stochastic gradient descent (SGD) with the logistic loss... We adopt a 1/poly(k) learning rate for w, and we set ηb/ηw = 1/k. ... Sample a mini-batch of examples of size n = poly(k)... T = poly(k) iterations.