SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning
Authors: Hao Chen, Ran Tao, Yue Fan, Yidong Wang, Jindong Wang, Bernt Schiele, Xing Xie, Bhiksha Raj, Marios Savvides
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, Soft Match shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 2Max Planck Institute for Informatics, Saarland Informatics Campus, 3Microsoft Research Asia, 4Mohamed bin Zayed University of AI |
| Pseudocode | Yes | Algorithm 1 Soft Match algorithm. |
| Open Source Code | Yes | More recent results of Soft Match are included in USB along its updates, refer https://github.com/Hhhhhhao/Soft Match for details. |
| Open Datasets | Yes | For the classic image classification setting, we evaluate on CIFAR-10/100 (Krizhevsky et al., 2009), SVHN(Netzer et al., 2011), STL-10 (Coates et al., 2011) and Image Net (Deng et al., 2009)... We further evaluate Soft Match on text topic classification tasks of AG News and DBpedia, and sentiment tasks of IMDb, Amazon-5, and Yelp-5 (Maas et al., 2011; Zhang et al., 2015). |
| Dataset Splits | Yes | We split a validation set from the training data to evaluate the algorithms. For IMDb and AG News, we randomly sample 1,000 data and 2,500 data per-class respectively as validation set, and other data is used as training set. For Amazon-5 and Yelp-5, we randomly sample 5,000 data and 50,000 data per-class as validation set and training set respectively. For DBpedia, the validation set and training set consist of 1,000 and 10,000 samples per-class. |
| Hardware Specification | Yes | We use NVIDIA V100 for training of classic image classification. ... NVIDIA V100 is used to train long-tailed image classfication, and the training time is around 1 day. ... We use NVIDIA V100 to train all text classification models. |
| Software Dependencies | No | The paper mentions software components like 'BERT-Base', 'Adam W optimizer', 'Torch SSL', 'USB', and 'fairseq' but does not provide specific version numbers for these, which are required for reproducible software dependency description. |
| Experiment Setup | Yes | For all experiments, we use SGD optimizer with a momentum of 0.9, where the initial learning rate η0 is set to 0.03. We adopt the cosine learning rate annealing scheme to adjust the learning rate with a total training step of 220. The labeled batch size BL is set to 64 and the unlabeled batch size BU is set to 7 times of BL for all datasets. We set m to 0.999 and divide the estimated variance ˆσt by 4 for 2σ of the Gaussian function. |