To Understand Deep Learning We Need to Understand Kernel Learning

Authors: Mikhail Belkin, Siyuan Ma, Soumik Mandal

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using six realworld and two synthetic datasets, we establish experimentally that kernel machines trained to have zero classification error or near zero regression error (interpolation) perform very well on test data.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA.
Pseudocode No The paper does not contain any pseudocode blocks or clearly labeled algorithm sections. Methods are described using mathematical equations and textual explanations.
Open Source Code No The paper does not provide a link to its own open-source code or explicitly state that the code for the described methodology is available. It refers to a previous work's method: "We used Eigen Pro-SGD (Ma & Belkin, 2017)".
Open Datasets Yes We use six real-world datasets (Section 3) and two synthetic datasets (Section 4) to demonstrate the ubiquity of this behavior. We also observe that regularization by early stopping provides at most a minor improvement to classifier performance. [...] We use standard datasets, including CIFAR-10, SVHN, TIMIT, HINT-S, 20 Newsgroups, MNIST.
Dataset Splits No The paper mentions 'training square loss' and 'test error' but does not specify the explicit percentages or counts for training, validation, and test splits (e.g., 80/10/10 split). While it uses standard datasets, the specific split methodology for reproduction is not detailed.
Hardware Specification Yes We used a Titan Xp GPU provided by Nvidia.
Software Dependencies No The paper mentions using "Eigen Pro-SGD (Ma & Belkin, 2017)" as a method and refers to other tools or libraries by name (e.g., "Pegasos", "scikit-learn"), but it does not provide specific version numbers for any software components or libraries required for reproducibility.
Experiment Setup No The paper mentions using "Eigen Pro-SGD method" and that training involves "epochs", but it does not provide specific hyperparameter values like learning rate, batch size, or the exact value of the "fixed kernel parameter" mentioned in a figure caption, which are crucial for reproducing the experiments.