Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training
Authors: Kai Sheng Tai, Peter D Bailis, Gregory Valiant
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with Fix Match, a state-of-the-art self-training algorithm. Our main baseline for comparison is the Fix Match algorithm (Sohn et al., 2020) since it is a state-of-the-art method for semi-supervised image classification. For each configuration, we report the mean and standard deviation of the error rate across 5 independent trials. |
| Researcher Affiliation | Academia | 1Stanford University, Stanford, CA, USA. Correspondence to: Kai Sheng Tai <kst@cs.stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Sinkhorn Label Allocation (SLA) and Algorithm 2 Self-training with Sinkhorn Label Allocation and consistency regularization |
| Open Source Code | Yes | Our code is available at https://github.com/stanford-futuredata/ sinkhorn-label-allocation. |
| Open Datasets | Yes | We used the CIFAR-10, CIFAR-100 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) image classification datasets with their standard train/test splits. |
| Dataset Splits | No | The paper mentions 'standard train/test splits' for CIFAR-10, CIFAR-100, and SVHN datasets, and details how labeled and unlabeled examples are sampled from the training split. However, it does not explicitly provide information about a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper describes various methods and architectures (e.g., 'Rand Augment', 'Wide Res Net-28-2 architecture'), but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., 'Python 3.x', 'PyTorch 1.x'). |
| Experiment Setup | Yes | We optimized our classifiers using the stochastic Nesterov accelerated gradient method with a momentum parameter of 0.9 and a cosine learning rate schedule given by 0.03 cos(7πt/16T), where t is the current iteration and T = 220 is the total number of iterations. We used a labeled batch size of 64, an unlabeled batch size of 448, weight decay of 5 10 4 on all parameters except biases and batch normalization weights, and unlabeled loss weight λ = 1. For hyperparameters specific to SLA, we used an Sinkhorn regularization parameter of γ = 100 and tolerance parameter ϵt = 0.01 ct 1 for Sinkhorn iteration, where ct is the target column sum at iteration t. |