A Kernel Perspective for Regularizing Deep Neural Networks
Authors: Alberto Bietti, Grégoire Mialon, Dexiong Chen, Julien Mairal
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested the regularization strategies presented in Section 2 in the context of improving generalization on small datasets and training robust models. Our goal is to use common architectures used for large datasets and improve their performance in different settings through regularization. Our Pytorch implementation of the various strategies is available at https://github.com/albietz/kernel_reg. ... Table 1. Regularization on CIFAR10 with 1 000 examples for VGG-11 and Res Net-18. Each entry shows the test accuracy with/without data augmentation when all hyper-parameters are optimized on a validation set. |
| Researcher Affiliation | Academia | 1Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France 2Département d informatique de l ENS, ENS, CNRS, Inria, PSL, 75005 Paris, France. |
| Pseudocode | No | The paper describes its methods and approaches using mathematical formulations and descriptive text, but it does not include any formally presented pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our Pytorch implementation of the various strategies is available at https://github.com/albietz/kernel_reg. |
| Open Datasets | Yes | We consider the datasets CIFAR10 and MNIST when using a small number of training examples, as well as 102 datasets of biological sequences that suffer from small sample size. ... We consider the Structural Classification Of Proteins (SCOP) version 1.67 dataset (Murzin et al., 1995) |
| Dataset Splits | Yes | In order to study the potential effectiveness of each method, we assume that a reasonably large validation set is available to select hyper-parameters; thus, we keep 10 000 annotated examples for this purpose. ... This allows us to use the first 51 datasets as a validation set for hyper-parameter tuning, and we report average performance with these fixed choices on the remaining 51 datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Our Pytorch implementation' but does not specify any version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | Each strategy derived in Section 2 is trained for 500 epochs using SGD with momentum and batch size 128, halving the step-size every 40 epochs. ... Training was done using Adam with a learning rate fixed to 0.01, and a weight decay parameter tuned for each method. |