Deep Equals Shallow for ReLU Networks in Kernel Regimes
Authors: Alberto Bietti, Francis Bach
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical experiments We now present numerical experiments on synthetic and real data to illustrate our theory. Our code is available at https://github.com/albietz/deep_shallow_kernel. Synthetic experiments. We consider randomly sampled inputs on the sphere S3 in 4 dimensions, and outputs generated according to the following target models... MNIST and Fashion-MNIST. In Table 1, we consider the image classification datasets MNIST and Fashion-MNIST, which both consist of 60k training and 10k test images... |
| Researcher Affiliation | Academia | Alberto Bietti NYU alberto.bietti@nyu.edu Francis Bach Inria francis.bach@inria.fr |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/albietz/deep_shallow_kernel. |
| Open Datasets | Yes | MNIST and Fashion-MNIST. In Table 1, we consider the image classification datasets MNIST and Fashion-MNIST, which both consist of 60k training and 10k test images of size 28x28 with 10 output classes. |
| Dataset Splits | Yes | We train on random subsets of 50k examples and use the remaining 10k examples for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions that code is available, but does not provide specific ancillary software details (e.g., library or solver names with version numbers). |
| Experiment Setup | Yes | The regularization parameter λ is optimized on 10 000 test datapoints on a logarithmic grid. (...) We evaluate one-versus-all classifiers obtained by using kernel ridge regression by setting y = 0.9 for the correct label and y = 0.1 otherwise. (...) We train on random subsets of 50k examples and use the remaining 10k examples for validation. |