Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Authors: Alex Tamkin, Mike Wu, Noah Goodman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remarkably, when pretraining on CIFAR-10, our learned views enable comparable transfer accuracy to the welltuned Sim CLR augmentations despite not including transformations like cropping or color jitter. Furthermore, our learned views significantly outperform baseline augmentations on speech recordings (+9 points on average) and wearable sensor data (+17 points on average).
Researcher Affiliation Academia Department of Computer Science Stanford University Stanford, CA 94305, USA {atamkin, wumike, ngoodman}@stanford.edu
Pseudocode Yes Algorithm 1: Generating viewmaker views Input: Viewmaker network V , C W H image X, 1 distortion budget , noise δ Output: Perturbed C W H image X P V (X, δ) // generate perturbation P CW H |P |1 P // project to 1 sphere X X + P // apply perturbation X clamp(X, 0, 1) // clamp (images only)
Open Source Code Yes Code is available at https://github.com/alextamkin/viewmaker.
Open Datasets Yes We pretrain Res Net-18 (He et al., 2015) models on CIFAR-10 (Krizhevsky, 2009) for 200 epochs... We train on the Librispeech dataset (Panayotov et al., 2015) for 200 epochs... We consider the Pamap2 dataset (Reiss & Stricker, 2012)...
Dataset Splits Yes We use the standard linear evaluation protocol... using the same train/validation/test splits as prior work (Moya Rueda et al., 2018). We train a linear classifier on the frozen encoder representations for 50 epochs, reporting accuracy on the validation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or specific cloud/cluster configurations used for experiments.
Software Dependencies No The paper mentions software like PyTorch Lightning and PyTorch in citations but does not specify their version numbers or the versions of other software dependencies.
Experiment Setup Yes We pretrain Res Net-18 (He et al., 2015) models on CIFAR-10 (Krizhevsky, 2009) for 200 epochs with a batch size of 256. We train a viewmaker-encoder system with a distortion budget of = 0.05. We tried distortion budgets 2 {0.1, 0.05, 0.02} and found 0.05 to work best; however, we anticipate that further tuning would yield additional gains.