Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Authors: Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results on CIFAR-10/100 [Krizhevsky and Hinton, 2009] to demonstrate the efficacy of combining DAC and RKD (i.e., the local and global perspectives of clustering) for semi-supervised learning in the low-label-rate regime. |
| Researcher Affiliation | Academia | Yijun Dong Courant Institute of Mathematical Sciences New York University New York, NY yd1319@nyu.edu Kevin Miller Oden Institute for Computational Engineering & Science University of Texas at Austin Austin, TX ksmiller@utexas.edu Qi Lei Courant Institute of Mathematical Sciences & Center of Data Science New York University New York, NY ql518@nyu.edu Rachel Ward Oden Institute for Computational Engineering & Science University of Texas at Austin Austin, TX rward@math.utexas.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The experiment code can be found at https://github.com/dyjdongyijun/Semi_Supervised_Knowledge_Distillation. |
| Open Datasets | Yes | In this section, we present experimental results on CIFAR-10/100 [Krizhevsky and Hinton, 2009] to demonstrate the efficacy of combining DAC and RKD (i.e., the local and global perspectives of clustering) for semi-supervised learning in the low-label-rate regime. |
| Dataset Splits | No | The paper mentions 'The average and standard deviation of the best test accuracy (i.e., early stopping with the maximum patience 128) are reported', which implies a validation process, but it explicitly uses 'test accuracy' and does not specify a separate validation set or its size/split for early stopping or hyperparameter tuning. |
| Hardware Specification | Yes | Both CIFAR-10/100 experiments are conducted on one NVIDIA A40 GPU. |
| Software Dependencies | No | The paper mentions `pytorch_cifar10` in the GitHub link, but it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or other dependencies. |
| Experiment Setup | Yes | Throughout the experiments, we used weight decay 0.0005. We train the student model via stochastic gradient descent (SGD) with Nesterov momentum 0.9 for 217 iterations (batches) with a batch size 64 8 = 29 (consisting of 64 labeled samples and 64 7 unlabeled samples). The initial learning rate is 0.03, decaying with a cosine scheduler. The test accuracies are evaluated... on an EMA model with a decay rate 0.999. |