Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

Authors: Zebin You, Yong Zhong, Fan Bao, Jiacheng Sun, Chongxuan LI, Jun Zhu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fréchet Inception Distance (FID) score of 3.08 or 2.52 on Image Net 256 256. Besides, DPT outperforms competitive semi-supervised baselines substantially on Image Net classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively.
Researcher Affiliation Collaboration Zebin You1,2 , Yong Zhong1,2 , Fan Bao3, Jiacheng Sun4, Chongxuan Li1,2 , Jun Zhu3 1 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China 3 Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch ML Center, Tsinghua University 4 Huawei Noah s Ark Lab
Pseudocode Yes We provide the pseudocode of DPT in the style of Py Torch in Appendix B.
Open Source Code Yes Our code is available at https://github.com/ML-GSAI/DPT.
Open Datasets Yes We evaluate DPT on the Image Net [20] dataset, which consists of 1,281,167 training and 50,000 validation images. and We also evaluate DPT on CIFAR-10 (see detailed experiments in Appendix A).
Dataset Splits Yes We evaluate DPT on the Image Net [20] dataset, which consists of 1,281,167 training and 50,000 validation images. and The labeled and unlabeled data split is the same as that of corresponding methods [17, 16].
Hardware Specification No The paper does not provide specific details on the hardware used, such as GPU or CPU models.
Software Dependencies No We provide the pseudocode of DPT in the style of Py Torch in Appendix B. The paper mentions PyTorch but does not provide specific version numbers for software dependencies.
Experiment Setup Yes In the first and third stages, we use the same pre-processing protocol for real images as the baselines [17, 16]. For instance, in MSN, the real data are resized to 256 256 and then center-cropped to 224 224. In the second stage, real images are center-cropped to the target resolution following [5]. In the third stage, we consider pseudo images at resolution 256 256 and center-crop them to 224 224. and For a fair comparison, we use the exact same architectures and hyperparameters as the baselines [17, 16, 5]. In particular, for MSN based DPT, we use a Vi T B/4 (or a Vi T L/7) model [17] for classification and a U-Vi T-Large (or a U-Vi T-Huge) model [5] for conditional generation. and We conduct detailed ablation experiments on the number of augmented pseudo images per class (i.e., K) and the classifier-free guidance scale (i.e., CFG) in Appendix G and find that the optimal K value is 128 and the optimal CFG values for different Image Net resolutions are 0.8 for 128 128, 0.4 for 256 256, and 0.7 for 512 512.