Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

You can't handle the (dirty) truth: Data-centric Insights Improve Pseudo-Labeling

Authors: Nabeel Seedat, Nicolas Huynh, Fergus Imrie, Mihaela van der Schaar

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now empirically investigate multiple aspects of DIPS... We evaluate the effectiveness of DIPS on 12 different real-world tabular datasets... We explore an extension of DIPS to images, highlighting its versatility. Setup. We investigate the use of DIPS to improve pseudo-labeling for CIFAR-10N (Wei et al., 2022a).
Researcher Affiliation Academia Nabeel Seedat EMAIL University of Cambridge, Cambridge, UK; Nicolas Huynh EMAIL University of Cambridge, Cambridge, UK; Fergus Imrie EMAIL University of California, Los Angeles, CA, USA; Mihaela van der Schaar EMAIL University of Cambridge, Cambridge, UK
Pseudocode Yes Algorithm 1 Plug DIPS into any pseudo-labeler
Open Source Code Yes 1. https://github.com/seedatnabeel/DIPS or https://github.com/vanderschaarlab/DIPS
Open Datasets Yes Datasets. The tabular datasets are drawn from a variety of domains (e.g. healthcare, finance)... For example, Covid-19 (Baqui et al., 2020), MAGGIC (Pocock et al., 2013), SEER (Duggan et al., 2016), and CUTRACT (Prostate Cancer PCUK, 2019) are medical datasets. COMPAS (Angwin et al., 2016) is a recidivism dataset. Credit is a financial default dataset from a Taiwan bank (Yeh and Lien, 2009). Higgs is a physics dataset (Baldi et al., 2014)... We investigate the use of DIPS to improve pseudo-labeling for CIFAR-10N (Wei et al., 2022a).
Dataset Splits Yes We report results in Fig. 4 across 50 random seeds with different data splits with a fixed proportion of Dlab : Dunlab of 0.1:0.9.
Hardware Specification Yes Figure 8: (a) DIPS improves the time efficiency (hours reported on a v100 GPU) of Fix Match, by 1.5-4X for the same performance ( better).
Software Dependencies No The paper mentions using 'XGBoost backbone' and 'Wide Res Net-28' but does not specify version numbers for these or any other software dependencies. Specific version numbers are required for a reproducible description of ancillary software.
Experiment Setup No The paper mentions a fixed proportion of Dlab : Dunlab of 0.1:0.9, using 50 random seeds for tabular datasets, and nlab = 1000 over three seeds for image datasets, and specifies model architectures like XGBoost and Wide ResNet-28. However, it does not explicitly provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings in the main text.