Regression with Multi-Expert Deferral

Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we report the results of extensive experiments showing the effectiveness of our proposed algorithms.
Researcher Affiliation Collaboration 1Courant Institute of Mathematical Sciences, New York, NY; 2Google Research, New York, NY. Correspondence to: Anqi Mao <aqmao@cims.nyu.edu>, Mehryar Mohri <mohri@google.com>, Yutao Zhong <yutao@cims.nyu.edu>.
Pseudocode No The paper contains mathematical derivations and loss function definitions, but no clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No No statement providing concrete access to source code (e.g., a repository link or explicit code release statement) for the methodology described in this paper was found.
Open Datasets Yes In this section, we report the empirical results for our single-stage and two-stage algorithms for regression with deferral on three datasets from the UCI machine learning repository (Asuncion & Newman, 2007), the Airfoil, Housing and Concrete, which have also been studied in (Cheng et al., 2023).
Dataset Splits Yes For each dataset, we randomly split it into a training set of 60% examples, a validation set of 20% examples and a test set of 20% examples.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No We used the Adam optimizer (Kingma & Ba, 2014) with a batch size of 256 and 2,000 training epochs. We adopted the squared loss as the regression loss (L = L2). For our single-stage surrogate loss (2) and two-stage surrogate loss (3), we choose ℓ= ℓlog as the logistic loss. No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow) are provided.
Experiment Setup Yes We used the Adam optimizer (Kingma & Ba, 2014) with a batch size of 256 and 2,000 training epochs. The learning rate for all datasets is selected from {0.01,0.05,0.1}. We adopted the squared loss as the regression loss (L = L2). For our single-stage surrogate loss (2) and two-stage surrogate loss (3), we choose ℓ= ℓlog as the logistic loss. In the experiments, we considered two types of costs: cj(x,y) = L(gj(x),y) and cj(x,y) = L(gj(x),y)+αj, for 1 j ne. We chose (α1,α2,α3) = (4.0,8.0,12.0).