Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer
Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically evaluate our proposed surrogate losses and compare them with existing baselines. In this section, we empirically evaluate our proposed surrogate losses and compare them with existing baselines. |
| Researcher Affiliation | Collaboration | Anqi Mao Courant Institute New York, NY 10012 EMAIL Mehryar Mohri Google Research & CIMS New York, NY 10011 EMAIL Yutao Zhong Courant Institute New York, NY 10012 EMAIL |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the methodology described is open-source or publicly available. |
| Open Datasets | Yes | We follow the setting of Mozannar et al. [2023] and conduct experiments on a synthetic dataset: Mixture-of-Gaussians [Mozannar et al., 2023], and three real-world datasets: CIFAR-10H [Battleday et al., 2020], Hate Speech [Davidson et al., 2017], and COMPASS [Dressel and Farid, 2018]. |
| Dataset Splits | Yes | Each dataset is randomly split into 70%, 10%, and 20% for training, validation, and testing, respectively. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU model, CPU type) used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific software packages or frameworks, but it does not provide specific version numbers for these dependencies (e.g., 'PyTorch 1.9' or 'Python 3.8'). |
| Experiment Setup | No | The paper states 'We use the same optimizer, learning rate, and number of epochs as chosen in [Mozannar et al., 2023]', which refers to an external source for hyperparameters rather than listing them explicitly in the main text of this paper. |