FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning
Authors: Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing Xie
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments indicate the superiority of Free Match especially when the labeled data are extremely rare. Free Match achieves 5.78%, 13.59%, and 1.28% error rate reduction over the latest state-of-the-art method Flex Match on CIFAR-10 with 1 label per class, STL-10 with 4 labels per class, and Image Net with 100 labels per class, respectively. Moreover, Free Match can also boost the performance of imbalanced SSL. The codes can be found at https: //github.com/microsoft/Semi-supervised-learning.1 |
| Researcher Affiliation | Collaboration | Yidong Wang1,2 , Hao Chen3 , Qiang Heng4, Wenxin Hou5, Yue Fan6, Zhen Wu7, Jindong Wang1 , Marios Savvides3, Takahiro Shinozaki2, Bhiksha Raj3,8, Bernt Schiele6, Xing Xie1 1Microsoft Research Asia, 2Tokyo Institute of Technology, 3Carnegie Mellon University, 4North Carolina State University, 5Microsoft STCA, 6Max Planck Institute for Informatics, Saarland Informatics Campus, 7Nanjing University, 8Mohamed bin Zayed University of AI |
| Pseudocode | Yes | Algorithm 1 Free Match algorithm at t-th iteration. 1: Input: Number of classes C, labeled batch X = {(xb, yb) : b (1, 2, . . . , B)}, unlabeled batch U = {ub : b (1, 2, . . . , µB)}, unsupervised loss weight wu, fairness loss weight wf, and EMA decay λ. 2: Compute Ls for labeled data Ls = 1 B PB b=1 H(yb, pm(y|ω(xb))) 3: Update the global threshold τt = λτt 1 + (1 λ) 1 µB PµB b=1 max(qb) {qb is an abbreviation of pm(y|ω(ub)), shape of τt: [1] } 4: Update the local threshold pt = λ pt 1 + (1 λ) 1 µB PµB b=1 qb {Shape of pt: [C]} 5: Update histogram for pt ht = λ ht 1 + (1 λ) HistµB (ˆqb) {Shape of ht: [C]} 6: for c = 1 to C do 7: τt(c) = Max Norm( pt(c)) τt {Calculate SAT} 8: end for 9: Compute Lu on unlabeled data Lu = 1 µB PµB b=1 1 (max (qb) τt(arg max (qb))) H(ˆqb, Qb) 10: Compute expectation of probability on unlabeled data p = 1 µB PµB b=1 1 (max (qb) τt(arg max (qb)) Qb {Qb is an abbr. of pm(y|Ω(ub)), shape of p: [C]} 11: Compute histogram for p h = HistµB 1 (max (qb) τt(arg max (qb)) ˆQb {Shape of h: [C]} 12: Compute Lf on unlabeled data Lf = H Sum Norm( pt ht ), Sum Norm( p h ) 13: Return: Ls + wu Lu + wf Lf |
| Open Source Code | Yes | The codes can be found at https: //github.com/microsoft/Semi-supervised-learning.1 |
| Open Datasets | Yes | We evaluate Free Match on common benchmarks: CIFAR-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), STL-10 (Coates et al., 2011) and Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper uses common benchmarks and mentions training for a certain number of iterations and a testing phase, but it does not explicitly specify the exact training, validation, and test dataset splits (e.g., percentages or sample counts) needed for reproduction. It mentions selecting the best error rates from 'all checkpoints', which implies a validation step, but the split details are not provided. |
| Hardware Specification | Yes | Single NVIDIA V100 is used for training on CIFAR-10, CIFAR-100, SVHN and STL-10. We use 4 Tesla V100 GPUs on Image Net. |
| Software Dependencies | No | The paper states, 'For fair comparison, we train and evaluate all methods using the unified codebase Torch SSL (Zhang et al., 2021) with the same backbones and hyperparameters.' It also mentions using SGD and Adam optimizers. However, it does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For fair comparison, we train and evaluate all methods using the unified codebase Torch SSL (Zhang et al., 2021) with the same backbones and hyperparameters. Concretely, we use Wide Res Net-28-2 (Zagoruyko & Komodakis, 2016) for CIFAR-10, Wide Res Net-28-8 for CIFAR-100, Wide Res Net-37-2 (Zhou et al., 2020) for STL-10, and Res Net-50 (He et al., 2016) for Image Net. We use SGD with a momentum of 0.9 as optimizer. The initial learning rate is 0.03 with a cosine learning rate decay schedule as η = η0 cos( 7πk 16K ), where η0 is the initial learning rate, k(K) is the current (total) training step and we set K = 220 for all datasets. The batch size of labeled data is 64 except for Image Net where we set 128. We use the same weight decay value, pre-defined threshold τ, unlabeled batch ratio µ and loss weights... For Free Match, we set wu = 1 for all experiments. Besides, we set wf = 0.01 for CIFAR-10 with 10 labels, CIFAR-100 with 400 labels, STL-10 with 40 labels, Image Net with 100k labels, and all experiments for SVHN. For other settings, we use wf = 0.05. The detailed hyperparameters are introduced in Appendix D. |