reproducibilityindex.ai

Meta-Learning Update Rules for Unsupervised Representation Learning

Authors: Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTAL RESULTS
Researcher Affiliation	Collaboration	Luke Metz Google Brain lmetz@google.com Niru Maheswaranathan Google Brain nirum@google.com Brian Cheung University of California, Berkeley bcheung@berkeley.edu Jascha Sohl-Dickstein Google Brain jaschasd@google.com
Pseudocode	Yes	Algorithm 1: Distributed Training Algorithm
Open Source Code	Yes	Additionally, code and meta-trained parameters θ for our meta-learned Unsupervised Update is available1. 1https://github.com/tensorflow/models/tree/master/research/learning_ unsupervised_learning
Open Datasets	Yes	We construct a set of training tasks consisting of CIFAR10 (Krizhevsky and Hinton, 2009) and multi-class classiﬁcation from subsets of classes from Imagenet (Russakovsky et al., 2015)... For evaluation, we use MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), IMDB (Maas et al., 2011)...
Dataset Splits	Yes	Our train set consists of Mini Imagenet, Alphabet, and Mini CIFAR. Our test sets are Mini Imagenet Test, Tiny Fashion MNIST, Tiny MNIST and IMDB. ... In order to encourage the learning of features that generalize well, we estimate the linear regression weights on one minibatch {xa, ya} of K data points, and evaluate the classiﬁcation performance on a second minibatch {xb, yb} also with K datapoints
Hardware Specification	No	Due to the small base models and the sequential nature of our compute workloads, we use multi core CPUs as opposed to GPUs.
Software Dependencies	No	We implement the above models in distributed Tensor Flow (Abadi et al., 2016).
Experiment Setup	Yes	We sample number of layers uniformly between 2-5 and the number of units per layer logarithmically between 64 to 512. ... Training takes 8 days, and consists of 200 thousand updates to θ with minibatch size 256. ... We use a learning rate schedule of 3e-4 for the ﬁrst 100k steps, then 1e-4 for next 50k steps, then 2e-5 for remainder of meta-training. We use gradient clipping of norm 5 on minibatchs of size 256. We compute our meta-objective by averaging 5 evaluation of the linear regression. We use a ridge penalty of 0.1 for all this work.