VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

Authors: Shijie Fang, Qianhan Feng, Tong Lin

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our methods are effective in multiple datasets and settings, reducing classification error rates and saving training time. Together, VCC-INFUSE reduces the error rate of Flex Match on the CIFAR-100 dataset by 1.08% while saving nearly half of the training time. We evaluate the effectiveness of our method on standard semi-supervised learning (SSL) datasets: CIFAR10/100 [Krizhevsky et al., 2009], SVHN [Netzer et al., 2011], STL-10 [Coates et al., 2011]. 5.3 Ablation Study.
Researcher Affiliation Collaboration Shijie Fang1,2 , Qianhan Feng1 , Tong Lin1 1National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Google, Shanghai, China shijiefang@google.com, fengqianhan@stu.pku.edu.cn, lintong@pku.edu.cn
Pseudocode No The paper describes the methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology or links to a code repository.
Open Datasets Yes We evaluate the effectiveness of our method on standard semi-supervised learning (SSL) datasets: CIFAR10/100 [Krizhevsky et al., 2009], SVHN [Netzer et al., 2011], STL-10 [Coates et al., 2011].
Dataset Splits No The paper states it uses a validation set ('min L(V, θ)') and that it follows 'the commonly used SSL setting [Sohn et al., 2020]', but it does not explicitly provide specific percentages, sample counts, or a precise reference for the train/validation/test dataset splits needed for reproduction.
Hardware Specification Yes VCC-INFUSE-Flex Match ... also decreases the training time from 223.96 GPU Hours to 115.47 GPU Hours (-48.44%). (Ours, keep raito=40%) ... The error rate and GPU Hours on A100 of different methods on CIFAR-100 dataset with 2500 labeled data.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup Yes The total number of iterations is 220 (segmented into 1024 epochs) and batch-size of labeled/unlabeled data is 64/448. We use SGD to optimize the parameters. The learning rate is initially set as η0 = 0.03 with a cosine learning rate decay schedule... As for VCC, the size of random noise z is set as 16... The encoder qϕ and decoder pθ are MLPs with 2 hidden layers (with dimensions 256 and 64). λV CC is set as 2.0. In INFUSE, the core set is updated for every 40 epochs, and the total number of iterations is adjusted with the keep ratio k.