Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TRIAGE: Characterizing and auditing training data for improved regression
Authors: Nabeel Seedat, Jonathan Crabbé, Zhaozhi Qian, Mihaela van der Schaar
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate the utility of TRIAGE across multiple use cases satisfying P1-P4, including consistent characterization, sculpting to improve performance in a variety of settings, as well as, guiding dataset selection and feature acquisition. Datasets. We conduct experiments on 10 real-world regression datasets with varying characteristics. |
| Researcher Affiliation | Academia | Nabeel Seedat University of Cambridge EMAIL Jonathan Crabbé University of Cambridge EMAIL Zhaozhi Qian University of Cambridge EMAIL Mihaela van der Schaar University of Cambridge EMAIL |
| Pseudocode | Yes | Algorithm 1 Computing a CPD |
| Open Source Code | Yes | Code: https://github.com/seedatnabeel/TRIAGE or https://github.com/vanderschaarlab/TRIAGE |
| Open Datasets | Yes | The datasets are drawn from diverse domains, including safety-critical medical regression: (i) Prostate cancer from the US [31] and UK [32], (ii) Hospital Length of Stay [33] and (iii) MIMIC Antibiotics [34]. Additionally, we analyze general UCI regression datasets [35], including Bike, Boston Housing, Bio, Concrete, Protein and Star. The datasets are detailed in Appendix B, along with further experimental details. |
| Dataset Splits | Yes | We partitioned each dataset into training, validation, and testing sets using an 80:10:10 split. All experiments are repeated 5 times, with different random seeds for consistency, and the average and standard deviation are reported. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions types of models used (e.g., Neural Networks, XGBoost) but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | Neural Networks (NNs) are built using a 3-layer Multi-Layer Perceptron (MLP) with 100 hidden units per layer, ReLU activation, and trained for 100 epochs using the Adam optimizer with a learning rate of 1e-3 and a batch size of 128. Early stopping is used with patience of 10 epochs. XGBoost models are trained with 1000 estimators, a learning rate of 0.1, and early stopping patience of 10 rounds. |