Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering

Authors: Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results on CIFAR-10/100 [Krizhevsky and Hinton, 2009] to demonstrate the efficacy of combining DAC and RKD (i.e., the local and global perspectives of clustering) for semi-supervised learning in the low-label-rate regime.
Researcher Affiliation Academia Yijun Dong Courant Institute of Mathematical Sciences New York University New York, NY yd1319@nyu.edu Kevin Miller Oden Institute for Computational Engineering & Science University of Texas at Austin Austin, TX ksmiller@utexas.edu Qi Lei Courant Institute of Mathematical Sciences & Center of Data Science New York University New York, NY ql518@nyu.edu Rachel Ward Oden Institute for Computational Engineering & Science University of Texas at Austin Austin, TX rward@math.utexas.edu
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The experiment code can be found at https://github.com/dyjdongyijun/Semi_Supervised_Knowledge_Distillation.
Open Datasets Yes In this section, we present experimental results on CIFAR-10/100 [Krizhevsky and Hinton, 2009] to demonstrate the efficacy of combining DAC and RKD (i.e., the local and global perspectives of clustering) for semi-supervised learning in the low-label-rate regime.
Dataset Splits No The paper mentions 'The average and standard deviation of the best test accuracy (i.e., early stopping with the maximum patience 128) are reported', which implies a validation process, but it explicitly uses 'test accuracy' and does not specify a separate validation set or its size/split for early stopping or hyperparameter tuning.
Hardware Specification Yes Both CIFAR-10/100 experiments are conducted on one NVIDIA A40 GPU.
Software Dependencies No The paper mentions `pytorch_cifar10` in the GitHub link, but it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or other dependencies.
Experiment Setup Yes Throughout the experiments, we used weight decay 0.0005. We train the student model via stochastic gradient descent (SGD) with Nesterov momentum 0.9 for 217 iterations (batches) with a batch size 64 8 = 29 (consisting of 64 labeled samples and 64 7 unlabeled samples). The initial learning rate is 0.03, decaying with a cosine scheduler. The test accuracies are evaluated... on an EMA model with a decay rate 0.999.