Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Authors: Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez6912-6920

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We empirically demonstrate through extensive experiments that an implementation of pseudo-labeling trained under curriculum labeling, achieves comparable performance against many other recent methods.
Researcher Affiliation Academia Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez University of Virginia {pc9za, fuwen.tan, yanjun, vicente}@virginia.edu
Pseudocode Yes Algorithm 1 Pseudo-Labeling under Curriculum Labeling
Open Source Code No The paper does not provide any specific links to source code repositories or explicit statements about code release for the described methodology.
Open Datasets Yes Datasets: We evaluate the proposed approach on three image classification datasets: CIFAR-10 (Krizhevsky 2012), Street View House Numbers (SVHN) (Netzer et al. 2011), and Image Net ILSVRC (Russakovsky et al. 2015; Deng et al. 2009).
Dataset Splits Yes With CIFAR-10 we use 4,000 labeled samples and 46,000 unlabeled samples for training and validation, and evaluate on 10,000 test samples. With SVHN we use 1,000 labeled samples and 71,257 unlabeled samples for training, 1,000 samples for validation, which is significantly lower than the conventional 7,325 samples generally used, and evaluate on 26,032 test samples. With Image Net we use 10% of the dataset as labeled samples (102,000 for training and 26,000 for validation), 1,253,167 unlabeled samples and 50,000 test samples.
Hardware Specification No The paper describes the neural network architectures used (e.g., CNN-13, Wide Res Net-28, Res Net-50) but does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper describes the training methodology, including optimizers and regularization techniques, but does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions).
Experiment Setup Yes The networks are optimized using Stochastic Gradient Descent with nesterov momentum. We use weight decay regularization of 0.0005, momentum factor of 0.9, and an initial learning rate of 0.1 which is then updated by cosine annealing (Loshchilov and Hutter 2016). Note that we use the same hyper-parameter setting for all of our experiments, except the batch size when applying moderate and heavy data augmentation. We empirically observe that small batches (i.e. 64-100) work better for moderate data augmentation (random cropping, padding, whitening and horizontal flipping), while large batches (i.e. 512-1024) work better for heavy data augmentation. For CIFAR-10 and SVHN, we train the models for 750 epochs. Starting from the 500th epoch, we also apply stochastic weight averaging (SWA) (Izmailov et al. 2018) every 5 epochs. For Image Net, we train the network for 220 epochs and apply SWA, starting from the 100th epoch.