Dataset Meta-Learning from Kernel Ridge-Regression

Authors: Timothy Nguyen, Zhourong Chen, Jaehoon Lee

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform three sets of experiments to validate the efficacy of KIP and LS for dataset learning. We focus on MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009) datasets for comparison to previous methods.
Researcher Affiliation Industry Timothy Nguyen Zhourong Chen Jaehoon Lee Google Research {timothycnguyen, zrchen, jaehlee}@google.com
Pseudocode Yes Algorithm 1: Kernel Inducing Point (KIP )
Open Source Code Yes We provide an open source implementation of KIP and LS , available in an interactive Colab notebook1. 1https://colab.research.google.com/github/google-research/google-research/blob/master/kip/KIP.ipynb
Open Datasets Yes We focus on MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009) datasets for comparison to previous methods.
Dataset Splits No We could have used a validation dataset for a stopping criterion, but that would have required reducing the target dataset from the entire training dataset.
Hardware Specification Yes using a single V100 GPU with 16GB of RAM
Software Dependencies No All our kernel-based experiments use the Neural Tangents library (Novak et al., 2020), built on top of JAX (Bradbury et al., 2018).
Experiment Setup Yes In all KIP trainings, we used the Adam optimizer. All our labels are mean-centered 1-hot labels. We used learning rates 0.01 and 0.04 for the MNIST and CIFAR-10 datasets, respectively. When sampling target batches, we always do so in a class-balanced way. All datasets are preprocessed using channel-wise standardization (i.e. mean subtraction and division by standard-deviation). For neural (tangent) kernels, we always use weight and bias variance σ2 w = 2 and σ2 b = 10 4, respectively. For both neural kernels and neural networks, we always use Re LU activation. Convolutional layers all use a (3, 3) filter with stride 1 and same padding. We train KIP for 10-20k iterations and took 5 random subsets of images for initializations.