How Does Semi-supervised Learning with Pseudo-labelers Work? A Case Study

Authors: Yiwen Kou, Zixiang Chen, Yuan Cao, Quanquan Gu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we perform numerical experiments on synthetic datasets, generated according to Definition 3.1, to verify our main theoretical results. The code and data for our experiments can be found on Github.
Researcher Affiliation Academia Yiwen Kou1, Zixiang Chen1, Yuan Cao2,3, Quanquan Gu1 1Department of Computer Science, University of California, Los Angeles 2Department of Statistics and Actuarial Science, The University of Hong Kong 3Department of Mathematics, The University of Hong Kong
Pseudocode No The paper describes the algorithms and training procedures mathematically and textually but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes The code and data for our experiments can be found on Github 1. 1https://github.com/uclaml/SSL Pseudo Labeler
Open Datasets No The paper states that experiments are performed on "synthetic datasets, generated according to Definition 3.1". It does not provide a link or citation to a pre-existing publicly available dataset.
Dataset Splits No The paper specifies 'labeled training sample size nl = 20' and 'pseudo-labeled training sample size nu = 20000' but does not explicitly describe training/validation/test splits or cross-validation settings.
Hardware Specification No The paper does not specify the hardware (e.g., GPU, CPU models) used for running the experiments.
Software Dependencies No The paper mentions 'activation function σ(z) = [z]3 +' but does not specify any software libraries or their version numbers.
Experiment Setup Yes In particular, we set the problem dimension d = 10000, labeled training sample size nl = 20 (10 positive samples and 10 negative samples), pseudo-labeled training sample size nu = 20000 (10000 positive samples and 10000 negative samples), feature vector v sampled from distribution N(0, I) and noise vector sampled from distribution N(0, σ2 p I) where σp = 10d0.01. ... network width m = 20, activation function σ(z) = [z]3 +, regularization parameter λ = 0.1 and learning rate η = 1 × 10−4. Besides, we initialize CNN parameters from N(0, σ2 0), where σ0 = 0.1 d−3/4. After 200 iterations, ... By applying learning rate η = 0.1 and after T = 100 iterations.