OTMatch: Improving Semi-Supervised Learning with Optimal Transport
Authors: Zhiquan Tan, Kaipeng Zheng, Weiran Huang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on many standard vision and language datasets. The empirical results show improvements in our method above baseline, this demonstrates the effectiveness and superiority of our approach in harnessing semantic relationships to enhance learning performance in a semi-supervised setting. |
| Researcher Affiliation | Collaboration | 1Department of Mathematical Sciences, Tsinghua University 2MIFA Lab, Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University 3Shanghai AI Laboratory. |
| Pseudocode | Yes | Algorithm 1 OTMatch training algorithm at t-th step |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for their method. |
| Open Datasets | Yes | Based on previous studies (Sohn et al., 2020; Zhang et al., 2021; Wang et al., 2022d), we evaluate our method on widely used vision semi-supervised benchmark datasets, including CIFAR-10/100, STL-10, and Image Net. Our approach (OTMatch) incorporates the optimal transport loss with the calculation of the unsupervised loss within Free Match. Our experiments primarily focus on realistic scenarios with limited labeled data. ... To demonstrate the utility of our approach, we further extend our evaluations to encompass USB datasets (Wang et al., 2022c) of language modality. Specifically, the results in Table 3 demonstrate that on both Amazon Review and Yelp Review... |
| Dataset Splits | No | The paper states 'The ratio of unlabeled data to labeled data is 7' and '220 total training iterations' but does not explicitly provide percentages or counts for training, validation, and test splits. It implies the use of standard benchmark datasets, which typically have predefined splits, but the paper itself does not detail them. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'SGD as the optimizer' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We utilize SGD as the optimizer with a momentum of 0.9 and a weight decay of 5 10 4. The learning rate follows a cosine annealing scheduler, initialized at 0.03. The batch size is set to 64, except for Image Net where it is 128. The ratio of unlabeled data to labeled data is 7. We report results over multiple runs over seeds. ... Our training process consists of 220 total training iterations, where each step involves sampling an equal number of labeled images from all classes. For the hyperparameter settings of our method, we set λ = 0.5 for CIFAR-10, λ = 0.15 for STL-10 and CIFAR-100, and λ = 0.01 for Image Net. The momentum coefficient of the cost update is set to 0.999. |