reproducibilityindex.ai

Time-Consistent Self-Supervision for Semi-Supervised Learning

Authors: Tianyi Zhou, Shengjie Wang, Jeff Bilmes

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that TC-SSL outperforms the very recent Mix Match and other SSL approaches on three datasets (CIFAR10, CIFAR100, and STL10) under various labeled-unlabeled splittings and signiﬁcantly improves SSL efﬁciency, i.e., consistently using < 20% training batches of what the best baseline needs.
Researcher Affiliation	Academia	1University of Washington, Seattle. Correspondence to: Tianyi Zhou <tianyizh@uw.edu>, Shengjie Wang <wangsj@uw.edu>, Jeff A. Bilmes <bilmes@uw.edu>.
Pseudocode	Yes	We provide the complete description of TC-SSL in Algorithm 1.
Open Source Code	No	No explicit statement about releasing source code for their method or a link to a code repository was found.
Open Datasets	Yes	CIFAR10, CIFAR100 (Krizhevsky & Hinton, 2009), and STL10 (Coates et al., 2011).
Dataset Splits	Yes	For CIFAR10 experiments, we train a small Wide Res Net-28-2 (28 layers, width factor of 2, 1.5-million parameters) and a large Wide Res Net-28-135 (28 layers, 135 ﬁlters per layer, 26-million parameters) for four kinds of labeled/unlabeled/validation random splittings applied to the original training set of CIFAR10, i.e., 500/44500/5000, 1000/44000/5000, 2000/43000/5000, 4000/41000/5000.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were provided.
Software Dependencies	No	Only 'Pytorch' is mentioned as a software dependency without a specific version number.
Experiment Setup	Yes	For TC-SSL in the experiments, we apply T0 = 10 warm starting epochs and T = 680 epochs in total. Note the epoch here refers to one iteration in Algorithm 1 and is different from its meaning in most fully supervised training, where it refers to a full pass of the whole training set. In our case, the training samples in each epoch changes according to our curriculum of kt. We apply SGD with momentum of 0.9 and weight decay of 2 10 5, and use a modiﬁed cosine annealing learning rate schedule (Loshchilov & Hutter, 2017) for multiple episodes of increasing length and decaying target learning rate, since it can quickly jump between different local minima on the loss landscape and explore more regions without being trapped by a bad local minima. In particular, we set up 12 episodes with epochs-per-episode starting from 10 (i.e., the warm starting episode) and increasing by 10 after every episode until reaching epoch-680. The learning rate at the beginning and end of the ﬁrst episode are set to 0.2 and 0.02, respectively. We then multiply each of them by 0.9 after every episode. We do not heavily tune the λ-parameters and γ-parameters. For all experiments, we use λcs = 20/C, λct = 0.2, λce = 1.0, γθ = γc = 0.99, γk = 0.005 (C is the number of classes). For data augmentation, we use Auto Augment (Cubuk et al., 2019a) learned policies for the three datasets followed by Mix Up with the mixing weight sampled from Beta(0.5, 0.5). We initialize k1 = 0.1\|U\| and θ0 by Pytorch default initialization. We apply all the practical tips detailed in Section 3.3.