Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels
Authors: Stefani Karp, Ezra Winston, Yuanzhi Li, Aarti Singh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We supplement our theoretical results by demonstrating this phenomenon empirically: in CIFAR-10 and MNIST images with various backgrounds, as the background noise increases in intensity, a CNN s performance stays relatively robust, whereas its corresponding neural tangent kernel sees a notable drop in performance. |
| Researcher Affiliation | Collaboration | Stefani Karp Carnegie Mellon University and Google Research shkarp@cs.cmu.edu Ezra Winston Carnegie Mellon University ewinston@cs.cmu.edu Yuanzhi Li Carnegie Mellon University yuanzhil@cs.cmu.edu Aarti Singh Carnegie Mellon University aarti@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Mini-batch SGD |
| Open Source Code | Yes | Code for experiments is available at https://github.com/skarp/local-signal-adaptivity. |
| Open Datasets | Yes | We create new datasets by embedding CIFAR-10 and MNIST images within either random Gaussian or IMAGENET backgrounds. ... CIFAR-10 [Krizhevsky, 2009] ... MNIST [Le Cun et al., 2010] ... IMAGENET backgrounds [Deng et al., 2009]. |
| Dataset Splits | No | The paper does not explicitly provide specific training, validation, or test dataset split percentages or counts in the main text. |
| Hardware Specification | No | The paper describes the models used (e.g., '10-layer Wide ResNet', 'small CNN') but does not specify the hardware used for training or inference, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like 'Neural Tangents' and 'JAX' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We initialize b deterministically at 0. We initialize w randomly by drawing from N 0, σ2 0Id d , where σ0 is 1/poly(k). We train the above CNN using mini-batch stochastic gradient descent (SGD) with the logistic loss... We adopt a 1/poly(k) learning rate for w, and we set ηb/ηw = 1/k. ... Sample a mini-batch of examples of size n = poly(k)... T = poly(k) iterations. |