Reverse Engineering the Neural Tangent Kernel

Authors: James Benjamin Simon, Sajant Anand, Mike Deweese

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments. Our main contributions are as follows: We experimentally verify our construction and demonstrate our reverse engineering paradigm for the design of FCNs in two experiments: (a) we engineer a single-hidden-layer, finite-width FCN that mimics the training and generalization behavior of a deep Re LU FCN over a range of network widths, and (b) we design an FCN that significantly outperforms Re LU FCNs on a synthetic parity problem. 4. Experimental results for finite networks
Researcher Affiliation Academia 1Department of Physics, University of California, Berkeley, Berkeley, CA 94720 2Redwood Center for Theoretical Neuroscience and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720.
Pseudocode No The paper describes methods using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce experiments is available at https://github.com/james-simon/reverse-engineering.
Open Datasets Yes We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task (800 train samples, 799 test samples, 11 features, 6 classes)... All datasets except CIFAR-10 were taken from the UCI repository (Dua & Graff, 2017). CIFAR-10 (Krizhevsky, 2009).
Dataset Splits Yes We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task (800 train samples, 799 test samples, 11 features, 6 classes)... Using k = 3-fold cross-validation on the training data, we choose the optimal stopping time for each net and then train networks of varying width, again averaging results over five random initializations.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments, only general statements about network training.
Software Dependencies No All experiments use JAX (Bradbury et al., 2018) and neural tangents (Novak et al., 2019b) for network training and kernel computation.
Experiment Setup Yes We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task... with full-batch gradient descent, mean-squared-error (MSE) loss, and step size 0.1. For all engineered 1HL networks, we use σw = 1, σb = 0. For all Re LU and erf networks, we use σw = 2, σb = 0.1 for all layers except the readout layer, for which we use σw = 1, σb = 0.