Regression with Multi-Expert Deferral
Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we report the results of extensive experiments showing the effectiveness of our proposed algorithms. |
| Researcher Affiliation | Collaboration | 1Courant Institute of Mathematical Sciences, New York, NY; 2Google Research, New York, NY. Correspondence to: Anqi Mao <aqmao@cims.nyu.edu>, Mehryar Mohri <mohri@google.com>, Yutao Zhong <yutao@cims.nyu.edu>. |
| Pseudocode | No | The paper contains mathematical derivations and loss function definitions, but no clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps. |
| Open Source Code | No | No statement providing concrete access to source code (e.g., a repository link or explicit code release statement) for the methodology described in this paper was found. |
| Open Datasets | Yes | In this section, we report the empirical results for our single-stage and two-stage algorithms for regression with deferral on three datasets from the UCI machine learning repository (Asuncion & Newman, 2007), the Airfoil, Housing and Concrete, which have also been studied in (Cheng et al., 2023). |
| Dataset Splits | Yes | For each dataset, we randomly split it into a training set of 60% examples, a validation set of 20% examples and a test set of 20% examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | We used the Adam optimizer (Kingma & Ba, 2014) with a batch size of 256 and 2,000 training epochs. We adopted the squared loss as the regression loss (L = L2). For our single-stage surrogate loss (2) and two-stage surrogate loss (3), we choose ℓ= ℓlog as the logistic loss. No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow) are provided. |
| Experiment Setup | Yes | We used the Adam optimizer (Kingma & Ba, 2014) with a batch size of 256 and 2,000 training epochs. The learning rate for all datasets is selected from {0.01,0.05,0.1}. We adopted the squared loss as the regression loss (L = L2). For our single-stage surrogate loss (2) and two-stage surrogate loss (3), we choose ℓ= ℓlog as the logistic loss. In the experiments, we considered two types of costs: cj(x,y) = L(gj(x),y) and cj(x,y) = L(gj(x),y)+αj, for 1 j ne. We chose (α1,α2,α3) = (4.0,8.0,12.0). |