Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift

Authors: Jiayun Wu, Jiashuo Liu, Peng Cui, Steven Z. Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose MC-Pseudolabel2, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization. The algorithm, with lightweight hyperparameters and optimization through a series of supervised regression steps, achieves superior performance on real-world datasets with distribution shift.
Researcher Affiliation Academia Jiayun Wu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 wujy22@mails.tsinghua.edu.cn Jiashuo Liu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 liujiashuo77@gmail.com Peng Cui Key Laboratory of Pervasive Computing, Ministry of Education Depart. of Computer Science & Tech., Tsinghua University Beijing, China 100084 cuip@tsinghua.edu.cn Zhiwei Steven Wu School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 zhiweiw@cs.cmu.edu
Pseudocode Yes Algorithm 1 MC-Pseudo Label Require: A dataset D = (Dx, Dy), a grouping function class H, a predictive function class F. 1: t 0; 2: f0 Initialization; {For example, models trained with ERM.} 3: m |Range(Discretize(f0))|; 4: repeat...
Open Source Code Yes Code available at: https://github.com/IC-hub/MC-Pseudolabel
Open Datasets Yes We experiment on Poverty Map [44] and ACSIncome [7] for the multi-environment setting, and Vessel Power [33] for the single-environment setting.
Dataset Splits Yes We select the best model across hyperparameters based on three model selection criteria, including in-distribution validation on the average of training data, worst-environment validation with the worst performance across training environments, and oracle validation on target data.
Hardware Specification Yes Each experiment with a single set of hyperparameters is run on one NVIDIA Ge Force RTX 3090 with 24GB of memory, taking at most 15 minutes.
Software Dependencies No Our experiments are based on the architecture of Py Torch [35].
Experiment Setup Yes Table 4: Hyperparameters for model architecture.