Reverse Engineering the Neural Tangent Kernel
Authors: James Benjamin Simon, Sajant Anand, Mike Deweese
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments. Our main contributions are as follows: We experimentally verify our construction and demonstrate our reverse engineering paradigm for the design of FCNs in two experiments: (a) we engineer a single-hidden-layer, finite-width FCN that mimics the training and generalization behavior of a deep Re LU FCN over a range of network widths, and (b) we design an FCN that significantly outperforms Re LU FCNs on a synthetic parity problem. 4. Experimental results for finite networks |
| Researcher Affiliation | Academia | 1Department of Physics, University of California, Berkeley, Berkeley, CA 94720 2Redwood Center for Theoretical Neuroscience and Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720. |
| Pseudocode | No | The paper describes methods using prose and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce experiments is available at https://github.com/james-simon/reverse-engineering. |
| Open Datasets | Yes | We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task (800 train samples, 799 test samples, 11 features, 6 classes)... All datasets except CIFAR-10 were taken from the UCI repository (Dua & Graff, 2017). CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | Yes | We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task (800 train samples, 799 test samples, 11 features, 6 classes)... Using k = 3-fold cross-validation on the training data, we choose the optimal stopping time for each net and then train networks of varying width, again averaging results over five random initializations. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments, only general statements about network training. |
| Software Dependencies | No | All experiments use JAX (Bradbury et al., 2018) and neural tangents (Novak et al., 2019b) for network training and kernel computation. |
| Experiment Setup | Yes | We train width 4096 1HL Re LU, 4HL Re LU, and 1HL ϕ networks on the UCI wine-quality-red task... with full-batch gradient descent, mean-squared-error (MSE) loss, and step size 0.1. For all engineered 1HL networks, we use σw = 1, σb = 0. For all Re LU and erf networks, we use σw = 2, σb = 0.1 for all layers except the readout layer, for which we use σw = 1, σb = 0. |