Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

You can't handle the (dirty) truth: Data-centric Insights Improve Pseudo-Labeling

Authors: Nabeel Seedat, Nicolas Huynh, Fergus Imrie, Mihaela van der Schaar

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now empirically investigate multiple aspects of DIPS... We evaluate the effectiveness of DIPS on 12 different real-world tabular datasets... We explore an extension of DIPS to images, highlighting its versatility. Setup. We investigate the use of DIPS to improve pseudo-labeling for CIFAR-10N (Wei et al., 2022a).
Researcher Affiliation	Academia	Nabeel Seedat EMAIL University of Cambridge, Cambridge, UK; Nicolas Huynh EMAIL University of Cambridge, Cambridge, UK; Fergus Imrie EMAIL University of California, Los Angeles, CA, USA; Mihaela van der Schaar EMAIL University of Cambridge, Cambridge, UK
Pseudocode	Yes	Algorithm 1 Plug DIPS into any pseudo-labeler
Open Source Code	Yes	1. https://github.com/seedatnabeel/DIPS or https://github.com/vanderschaarlab/DIPS
Open Datasets	Yes	Datasets. The tabular datasets are drawn from a variety of domains (e.g. healthcare, finance)... For example, Covid-19 (Baqui et al., 2020), MAGGIC (Pocock et al., 2013), SEER (Duggan et al., 2016), and CUTRACT (Prostate Cancer PCUK, 2019) are medical datasets. COMPAS (Angwin et al., 2016) is a recidivism dataset. Credit is a financial default dataset from a Taiwan bank (Yeh and Lien, 2009). Higgs is a physics dataset (Baldi et al., 2014)... We investigate the use of DIPS to improve pseudo-labeling for CIFAR-10N (Wei et al., 2022a).
Dataset Splits	Yes	We report results in Fig. 4 across 50 random seeds with different data splits with a fixed proportion of Dlab : Dunlab of 0.1:0.9.
Hardware Specification	Yes	Figure 8: (a) DIPS improves the time efficiency (hours reported on a v100 GPU) of Fix Match, by 1.5-4X for the same performance ( better).
Software Dependencies	No	The paper mentions using 'XGBoost backbone' and 'Wide Res Net-28' but does not specify version numbers for these or any other software dependencies. Specific version numbers are required for a reproducible description of ancillary software.
Experiment Setup	No	The paper mentions a fixed proportion of Dlab : Dunlab of 0.1:0.9, using 50 random seeds for tabular datasets, and nlab = 1000 over three seeds for image datasets, and specifies model architectures like XGBoost and Wide ResNet-28. However, it does not explicitly provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings in the main text.