reproducibilityindex.ai

Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

Authors: Qian Shao, Jiangrui Kang, Qiyuan Chen, Zepeng Li, Hongxia Xu, Yiwen Cao, JIAJUAN LIANG, Jian Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that RDSS consistently improves the performance of several popular SSL frameworks and outperforms the state-of-the-art sample selection approaches used in Active Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained annotation budgets. Our code is available at RDSS.
Researcher Affiliation	Collaboration	Qian Shao1,3 , Jiangrui Kang2 , Qiyuan Chen1,3 , Zepeng Li4, Hongxia Xu1,3, Yiwen Cao2, Jiajuan Liang2 , and Jian Wu1 1College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University 2BNU-HKBU United International College 3We Doctor Cloud 4The State Key Laboratory of Blockchain and Data Security, Zhejiang University
Pseudocode	Yes	Algorithm 1 Generalized Kernel Herding without Replacement Algorithm 2 Generalized Kernel Herding
Open Source Code	Yes	Our code is available at RDSS.
Open Datasets	Yes	We choose five common datasets: CIFAR-10/100 [19], SVHN [30], STL-10 [9] and Image Net [10].
Dataset Splits	No	For CIFAR-10 and CIFAR-100, the paper states "50,000 images are for training, and 10,000 images are for testing", which describes a train/test split. However, it does not explicitly mention a separate validation split or how one was derived/used for hyperparameter tuning or early stopping.
Hardware Specification	Yes	Experiments are run on 8NVIDIA Tesla A100 (40 GB) and 2Intel 6248R 24-Core Processor.
Software Dependencies	No	The paper mentions software like "CLIP [33]", "Res Net-50 [16]", "Wide Res Net-28-2 [62]", and "Unified SSL Benchmark (USB) [52]", and the use of "standard stochastic gradient descent (SGD)". However, it does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	The optimizer for all experiments is standard stochastic gradient descent (SGD) with a momentum of 0.9. The initial learning rate is 0.03 with a learning rate decay of 0.0005. We use Res Net-50 [16] for the Image Net experiment and Wide Res Net-28-2 [62] for other datasets. The value of σ for the Gaussian kernel function is set as explained in Section 6. To ensure diversity in the sampled data, we introduce a penalty factor given by α = 1 - 1/m, where m denotes the number of selected samples. Concretely, we set m = {40, 250, 4000} for CIFAR-10, m = {400, 2500, 10000} for CIFAR-100, m = {250, 1000} for SVHN, m = {40, 250} for STL-10 and m = {100000} for Image Net.