Temporal Ensembling for Semi-Supervised Learning
Authors: Samuli Laine, Timo Aila
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. |
| Researcher Affiliation | Industry | Samuli Laine NVIDIA slaine@nvidia.com Timo Aila NVIDIA taila@nvidia.com |
| Pseudocode | Yes | Algorithm 1 Π-model pseudocode. Require: xi = training stimuli Require: L = set of training input indices with known labels Require: yi = labels for labeled inputs i L Require: w(t) = unsupervised weight ramp-up function Require: fθ(x) = stochastic neural network with trainable parameters θ Require: g(x) = stochastic input augmentation function for t in [1, num epochs] do for each minibatch B do zi B fθ(g(xi B)) evaluate network outputs for augmented inputs zi B fθ(g(xi B)) again, with different dropout and augmentation loss 1 |B| P i (B L) log zi[yi] supervised loss component + w(t) 1 C|B| P i B ||zi zi||2 unsupervised loss component update θ using, e.g., ADAM update network parameters end for end for return θ |
| Open Source Code | Yes | Our implementation is written in Python using Theano (Theano Development Team, 2016) and Lasagne (Dieleman et al., 2015), and is available at https://github.com/smlaine2/tempens. |
| Open Datasets | Yes | We test the Π-model and temporal ensembling in two image classification tasks, CIFAR-10 and SVHN, and report the mean and standard deviation of 10 runs using different random seeds. The CIFAR-100 dataset consists of 32 × 32 pixel RGB images from a hundred classes. |
| Dataset Splits | No | The paper refers to 'training stimuli' and 'labeled inputs' for training, and uses 'test' sets for evaluation, but does not explicitly describe a separate validation set split, its size, or how it's partitioned from the training data. |
| Hardware Specification | No | The paper describes the software implementation details, but does not provide any specific hardware specifications such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | Our implementation is written in Python using Theano (Theano Development Team, 2016) and Lasagne (Dieleman et al., 2015). While software is mentioned, specific version numbers for Theano and Lasagne are not provided. |
| Experiment Setup | Yes | All networks were trained using Adam (Kingma & Ba, 2014) with a maximum learning rate of λmax = 0.003, except for temporal ensembling in the SVHN case where a maximum learning rate of λmax = 0.001 worked better. Adam momentum parameters were set to β1 = 0.9 and β2 = 0.999 as suggested in the paper. The maximum value for the unsupervised loss component was set to wmax M/N, where M is the number of labeled inputs and N is the total number of training inputs. For Π-model runs, we used wmax = 100 in all runs except for CIFAR-100 with Tiny Images where we set wmax = 300. For temporal ensembling we used wmax = 30 in most runs. All networks were trained for 300 epochs with minibatch size of 100. |