Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
Authors: Avital Oliver, Augustus Odena, Colin A. Raffel, Ekin Dogus Cubuk, Ian Goodfellow
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. |
| Researcher Affiliation | Industry | Google Brain EMAIL |
| Pseudocode | No | The paper describes algorithms but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available.2... 2https://github.com/brain-research/realistic-ssl-evaluation |
| Open Datasets | Yes | We tested each SSL approach on the widely-reported image classification benchmarks of SVHN [40] with all but 1000 labels discarded and CIFAR-10 [31] with all but 4,000 labels discarded. |
| Dataset Splits | Yes | We optimized hyperparameters to minimize classification error on the standard validation set from each dataset, as is common practice (an approach we evaluate critically in section 4.6). |
| Hardware Specification | No | For every SSL technique in addition to a fully-supervised (not utilizing unlabeled data) baseline, we ran 1000 trials of Gaussian Process-based black box optimization using Google Cloud ML Engine s hyperparameter tuning service [18]. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Wide Res Net' but does not specify their version numbers or the versions of underlying libraries or programming languages. |
| Experiment Setup | Yes | We chose a Wide Res Net [52], due to their widespread adoption and availability. Specifically, we used WRN-28-2... For training, we chose the ubiquitous Adam optimizer [29]. For all datasets, we followed standard procedures for regularization, data augmentation, and preprocessing; details are in appendix B. ... An enumeration of these hyperparameter settings can be found in appendix C. |