When Do Neural Networks Outperform Kernel Methods?
Authors: Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1 we carry out such an experiment using Fashion MNIST (FMNIST) data (d = 784, n = 60000, 10 classes). We compare two-layers NN with the RF and NT models. We choose the architectures of NN, NT, RF as to match the number of parameters: namely we used N = 4096 for NN and NT and N = 321126 for RF. We also fit the corresponding RKHS models (corresponding to N = ) using kernel ridge regression (KRR), and two simple polynomial models: fℓ(x) = Pℓ k=0 Bk, x k , for ℓ {1, 2}. In the unperturbed dataset, all of these approaches have comparable accuracies (except the linear fit). As noise is added, RF, NT, and RKHS methods deteriorate rapidly. While the accuracy of NN decreases as well, it significantly outperforms other methods. |
| Researcher Affiliation | Collaboration | Department of Electrical Engineering, Stanford University Department of Statistics, University of California, Berkeley Department of Statistics, Stanford University Google Research, Brain Team |
| Pseudocode | No | The paper describes models and theoretical results but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code used to produce our results can be accessed at https://github.com/b Ghorbani/ linearized_neural_networks. |
| Open Datasets | Yes | In Figure 1 we carry out such an experiment using Fashion MNIST (FMNIST) data (d = 784, n = 60000, 10 classes). |
| Dataset Splits | No | The paper mentions training on FMNIST and CIFAR-10 data but does not explicitly state the train/validation/test splits or ratios. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or cloud computing instances). |
| Software Dependencies | No | The paper mentions using "Re LU activations" but does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch versions) that would be needed for replication. |
| Experiment Setup | Yes | We choose the architectures of NN, NT, RF as to match the number of parameters: namely we used N = 4096 for NN and NT and N = 321126 for RF. We also fit the corresponding RKHS models (corresponding to N = ) using kernel ridge regression (KRR), and two simple polynomial models: fℓ(x) = Pℓ k=0 Bk, x k , for ℓ {1, 2}. Throughout we use Re LU activations. |