Reinforcement Learning Guided Semi-Supervised Learning
Authors: Marzi Heidari, Hanping Zhang, Yuhong Guo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods. |
| Researcher Affiliation | Academia | Marzi Heidari1, Hanping Zhang1, Yuhong Guo1,2 1School of Computer Science, Carleton University, Ottawa, Canada 2CIFAR AI Chair, Amii, Canada {marziheidari@cmail, jagzhang@cmail, yuhong.guo}.carleton.ca |
| Pseudocode | Yes | Algorithm 1 Pseudo-Label Based Policy Gradient Descent |
| Open Source Code | No | The code is currently not shared. (Justification from NeurIPS Paper Checklist, Section 5) |
| Open Datasets | Yes | Datasets We conducted comprehensive experiments on four image classification benchmarks: CIFAR-10, CIFAR-100 [43], SVHN [44], and STL-10 [45]. |
| Dataset Splits | Yes | We adhere to the conventional dataset splits used in the literature. Consistent with previous works, on each dataset we preserved the labels of a randomly selected subset of training samples with an equal number of samples for each class, and left the remaining samples unlabeled. In order to compare with previous works in the same settings, we performed experiments on CIFAR-10 with various numbers (N l {250, 1, 000, 2, 000, 4, 000}) of labeled samples, on CIFAR-100 with 2,500, 10,000 and 4,000 labeled samples, on SVHN with 1,000 and 500 labeled samples, and on STL-10 with 1,000 labeled images. |
| Hardware Specification | Yes | Our experiments were performed on setups featuring CPUs with 8 Intel Core processors and 64 GB of RAM. For graphics processing units, we utilized NVIDIA Ge Force RTX 3060 cards, each offering 12 GB of VRAM. |
| Software Dependencies | No | The paper mentions optimizers (SGD) and learning rate techniques but does not specify version numbers for any software libraries like PyTorch or TensorFlow, nor programming language versions like Python. |
| Experiment Setup | Yes | For training the CNN-13 architecture, we employed the SGD optimizer with a Nesterov momentum of 0.9. We used an L2 regularization coefficient of 1e-4 for CIFAR-10 and CIFAR-100, and 5e-5 for SVHN. The initial learning rate was set to 0.1, and the cosine learning rate annealing technique proposed in previous studies [46, 15] was utilized. For the WRN-28-2 architecture, we followed the suggestion from Mix Match [1] and used an L2 regularization coefficient of 4e-4. For WRN-37-2, the training configuration includes the SGD optimizer, an L2 regularization coefficient of 5e-4, and an initial learning rate of 0.01. Finally, the training configuration for the WRN-28-8 model includes using the SGD optimizer, an L2 regularization coefficient of 0.001, and starting with a learning rate of 0.01. To compute the parameters of the teacher model, we employed the EMA method with a decay rate β = 0.999. We selected all hyperparameters and training techniques based on relevant studies to ensure a fair comparison between our approach and the existing methods. Specifically for RLGSSL, we set the batch size to 128, and set λ1 = λ2 = 0.1. We first pre-train the model for 50 epochs using the Mean-Teacher algorithm and then proceed to the training procedure of RLGSSL for 400 epochs. |