Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels
Authors: Zebin You, Yong Zhong, Fan Bao, Jiacheng Sun, Chongxuan LI, Jun Zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fréchet Inception Distance (FID) score of 3.08 or 2.52 on Image Net 256 256. Besides, DPT outperforms competitive semi-supervised baselines substantially on Image Net classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively. |
| Researcher Affiliation | Collaboration | Zebin You1,2 , Yong Zhong1,2 , Fan Bao3, Jiacheng Sun4, Chongxuan Li1,2 , Jun Zhu3 1 Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China 3 Dept. of Comp. Sci. & Tech., BNRist Center, THU-Bosch ML Center, Tsinghua University 4 Huawei Noah s Ark Lab |
| Pseudocode | Yes | We provide the pseudocode of DPT in the style of Py Torch in Appendix B. |
| Open Source Code | Yes | Our code is available at https://github.com/ML-GSAI/DPT. |
| Open Datasets | Yes | We evaluate DPT on the Image Net [20] dataset, which consists of 1,281,167 training and 50,000 validation images. and We also evaluate DPT on CIFAR-10 (see detailed experiments in Appendix A). |
| Dataset Splits | Yes | We evaluate DPT on the Image Net [20] dataset, which consists of 1,281,167 training and 50,000 validation images. and The labeled and unlabeled data split is the same as that of corresponding methods [17, 16]. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU or CPU models. |
| Software Dependencies | No | We provide the pseudocode of DPT in the style of Py Torch in Appendix B. The paper mentions PyTorch but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | In the first and third stages, we use the same pre-processing protocol for real images as the baselines [17, 16]. For instance, in MSN, the real data are resized to 256 256 and then center-cropped to 224 224. In the second stage, real images are center-cropped to the target resolution following [5]. In the third stage, we consider pseudo images at resolution 256 256 and center-crop them to 224 224. and For a fair comparison, we use the exact same architectures and hyperparameters as the baselines [17, 16, 5]. In particular, for MSN based DPT, we use a Vi T B/4 (or a Vi T L/7) model [17] for classification and a U-Vi T-Large (or a U-Vi T-Huge) model [5] for conditional generation. and We conduct detailed ablation experiments on the number of augmented pseudo images per class (i.e., K) and the classifier-free guidance scale (i.e., CFG) in Appendix G and find that the optimal K value is 128 and the optimal CFG values for different Image Net resolutions are 0.8 for 128 128, 0.4 for 256 256, and 0.7 for 512 512. |