Temporal Ensembling for Semi-Supervised Learning

Authors: Samuli Laine, Timo Aila

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations.
Researcher Affiliation Industry Samuli Laine NVIDIA slaine@nvidia.com Timo Aila NVIDIA taila@nvidia.com
Pseudocode Yes Algorithm 1 Π-model pseudocode. Require: xi = training stimuli Require: L = set of training input indices with known labels Require: yi = labels for labeled inputs i L Require: w(t) = unsupervised weight ramp-up function Require: fθ(x) = stochastic neural network with trainable parameters θ Require: g(x) = stochastic input augmentation function for t in [1, num epochs] do for each minibatch B do zi B fθ(g(xi B)) evaluate network outputs for augmented inputs zi B fθ(g(xi B)) again, with different dropout and augmentation loss 1 |B| P i (B L) log zi[yi] supervised loss component + w(t) 1 C|B| P i B ||zi zi||2 unsupervised loss component update θ using, e.g., ADAM update network parameters end for end for return θ
Open Source Code Yes Our implementation is written in Python using Theano (Theano Development Team, 2016) and Lasagne (Dieleman et al., 2015), and is available at https://github.com/smlaine2/tempens.
Open Datasets Yes We test the Π-model and temporal ensembling in two image classification tasks, CIFAR-10 and SVHN, and report the mean and standard deviation of 10 runs using different random seeds. The CIFAR-100 dataset consists of 32 × 32 pixel RGB images from a hundred classes.
Dataset Splits No The paper refers to 'training stimuli' and 'labeled inputs' for training, and uses 'test' sets for evaluation, but does not explicitly describe a separate validation set split, its size, or how it's partitioned from the training data.
Hardware Specification No The paper describes the software implementation details, but does not provide any specific hardware specifications such as GPU or CPU models used for running the experiments.
Software Dependencies No Our implementation is written in Python using Theano (Theano Development Team, 2016) and Lasagne (Dieleman et al., 2015). While software is mentioned, specific version numbers for Theano and Lasagne are not provided.
Experiment Setup Yes All networks were trained using Adam (Kingma & Ba, 2014) with a maximum learning rate of λmax = 0.003, except for temporal ensembling in the SVHN case where a maximum learning rate of λmax = 0.001 worked better. Adam momentum parameters were set to β1 = 0.9 and β2 = 0.999 as suggested in the paper. The maximum value for the unsupervised loss component was set to wmax M/N, where M is the number of labeled inputs and N is the total number of training inputs. For Π-model runs, we used wmax = 100 in all runs except for CIFAR-100 with Tiny Images where we set wmax = 300. For temporal ensembling we used wmax = 30 in most runs. All networks were trained for 300 epochs with minibatch size of 100.