Dataset Meta-Learning from Kernel Ridge-Regression
Authors: Timothy Nguyen, Zhourong Chen, Jaehoon Lee
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform three sets of experiments to validate the efficacy of KIP and LS for dataset learning. We focus on MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009) datasets for comparison to previous methods. |
| Researcher Affiliation | Industry | Timothy Nguyen Zhourong Chen Jaehoon Lee Google Research {timothycnguyen, zrchen, jaehlee}@google.com |
| Pseudocode | Yes | Algorithm 1: Kernel Inducing Point (KIP ) |
| Open Source Code | Yes | We provide an open source implementation of KIP and LS , available in an interactive Colab notebook1. 1https://colab.research.google.com/github/google-research/google-research/blob/master/kip/KIP.ipynb |
| Open Datasets | Yes | We focus on MNIST (Le Cun et al., 2010) and CIFAR-10 (Krizhevsky et al., 2009) datasets for comparison to previous methods. |
| Dataset Splits | No | We could have used a validation dataset for a stopping criterion, but that would have required reducing the target dataset from the entire training dataset. |
| Hardware Specification | Yes | using a single V100 GPU with 16GB of RAM |
| Software Dependencies | No | All our kernel-based experiments use the Neural Tangents library (Novak et al., 2020), built on top of JAX (Bradbury et al., 2018). |
| Experiment Setup | Yes | In all KIP trainings, we used the Adam optimizer. All our labels are mean-centered 1-hot labels. We used learning rates 0.01 and 0.04 for the MNIST and CIFAR-10 datasets, respectively. When sampling target batches, we always do so in a class-balanced way. All datasets are preprocessed using channel-wise standardization (i.e. mean subtraction and division by standard-deviation). For neural (tangent) kernels, we always use weight and bias variance σ2 w = 2 and σ2 b = 10 4, respectively. For both neural kernels and neural networks, we always use Re LU activation. Convolutional layers all use a (3, 3) filter with stride 1 and same padding. We train KIP for 10-20k iterations and took 5 random subsets of images for initializations. |