Strong inductive biases provably prevent harmless interpolation
Authors: Michael Aerni, Marco Milanta, Konstantin Donhauser, Fanny Yang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels, where the filter size regulates the strength of the inductive bias. We further provide empirical evidence of the same behavior for deep neural networks with varying filter sizes and rotational invariance. |
| Researcher Affiliation | Academia | Michael Aerni 1, Marco Milanta 1, Konstantin Donhauser1,2, Fanny Yang1 1Department of Computer Science, ETH Zurich 2ETH AI Center |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code to replicate all experiments and plots in https://github.com/michaelaerni/ iclr23-Inductive Biases Harmless Interpolation. |
| Open Datasets | Yes | As an example dataset with a rotationally invariant ground truth, we classify satellite images from the Euro SAT dataset (Helber et al., 2018) into 10 types of land usage. |
| Dataset Splits | No | The paper specifies '200 training samples' and '100k test samples' for synthetic images, and '7680 raw training and 10k raw test samples' for Euro SAT, but does not explicitly mention a validation set or split percentages for training, validation, and test. |
| Hardware Specification | No | The paper mentions 'GPUs' implicitly through 'CUDA' in some contexts of deep learning, but does not specify any particular GPU models, CPU models, or other hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch weight initialization' and 'PyTorch', but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Optimization minimizes the logistic loss for 300 epochs of mini-batch SGD with momentum 0.9 and batch size 100. We linearly increase the learning rate from 10 6 to a peak value of 0.2 during the first 50 epochs, and then reduce the learning rate according to an inverse square-root decay every 20 epochs. |