Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
Authors: Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez6912-6920
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We empirically demonstrate through extensive experiments that an implementation of pseudo-labeling trained under curriculum labeling, achieves comparable performance against many other recent methods. |
| Researcher Affiliation | Academia | Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez University of Virginia {pc9za, fuwen.tan, yanjun, vicente}@virginia.edu |
| Pseudocode | Yes | Algorithm 1 Pseudo-Labeling under Curriculum Labeling |
| Open Source Code | No | The paper does not provide any specific links to source code repositories or explicit statements about code release for the described methodology. |
| Open Datasets | Yes | Datasets: We evaluate the proposed approach on three image classification datasets: CIFAR-10 (Krizhevsky 2012), Street View House Numbers (SVHN) (Netzer et al. 2011), and Image Net ILSVRC (Russakovsky et al. 2015; Deng et al. 2009). |
| Dataset Splits | Yes | With CIFAR-10 we use 4,000 labeled samples and 46,000 unlabeled samples for training and validation, and evaluate on 10,000 test samples. With SVHN we use 1,000 labeled samples and 71,257 unlabeled samples for training, 1,000 samples for validation, which is significantly lower than the conventional 7,325 samples generally used, and evaluate on 26,032 test samples. With Image Net we use 10% of the dataset as labeled samples (102,000 for training and 26,000 for validation), 1,253,167 unlabeled samples and 50,000 test samples. |
| Hardware Specification | No | The paper describes the neural network architectures used (e.g., CNN-13, Wide Res Net-28, Res Net-50) but does not provide any specific details about the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper describes the training methodology, including optimizers and regularization techniques, but does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions). |
| Experiment Setup | Yes | The networks are optimized using Stochastic Gradient Descent with nesterov momentum. We use weight decay regularization of 0.0005, momentum factor of 0.9, and an initial learning rate of 0.1 which is then updated by cosine annealing (Loshchilov and Hutter 2016). Note that we use the same hyper-parameter setting for all of our experiments, except the batch size when applying moderate and heavy data augmentation. We empirically observe that small batches (i.e. 64-100) work better for moderate data augmentation (random cropping, padding, whitening and horizontal flipping), while large batches (i.e. 512-1024) work better for heavy data augmentation. For CIFAR-10 and SVHN, we train the models for 750 epochs. Starting from the 500th epoch, we also apply stochastic weight averaging (SWA) (Izmailov et al. 2018) every 5 epochs. For Image Net, we train the network for 220 epochs and apply SWA, starting from the 100th epoch. |