Strong inductive biases provably prevent harmless interpolation

Authors: Michael Aerni, Marco Milanta, Konstantin Donhauser, Fanny Yang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main theoretical result establishes tight non-asymptotic bounds for high-dimensional kernel regression that reflect this phenomenon for convolutional kernels, where the filter size regulates the strength of the inductive bias. We further provide empirical evidence of the same behavior for deep neural networks with varying filter sizes and rotational invariance.
Researcher Affiliation Academia Michael Aerni 1, Marco Milanta 1, Konstantin Donhauser1,2, Fanny Yang1 1Department of Computer Science, ETH Zurich 2ETH AI Center
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We provide the code to replicate all experiments and plots in https://github.com/michaelaerni/ iclr23-Inductive Biases Harmless Interpolation.
Open Datasets Yes As an example dataset with a rotationally invariant ground truth, we classify satellite images from the Euro SAT dataset (Helber et al., 2018) into 10 types of land usage.
Dataset Splits No The paper specifies '200 training samples' and '100k test samples' for synthetic images, and '7680 raw training and 10k raw test samples' for Euro SAT, but does not explicitly mention a validation set or split percentages for training, validation, and test.
Hardware Specification No The paper mentions 'GPUs' implicitly through 'CUDA' in some contexts of deep learning, but does not specify any particular GPU models, CPU models, or other hardware used for running the experiments.
Software Dependencies No The paper mentions 'PyTorch weight initialization' and 'PyTorch', but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Optimization minimizes the logistic loss for 300 epochs of mini-batch SGD with momentum 0.9 and batch size 100. We linearly increase the learning rate from 10 6 to a peak value of 0.2 during the first 50 epochs, and then reduce the learning rate according to an inverse square-root decay every 20 epochs.