reproducibilityindex.ai

Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift

Authors: Jiayun Wu, Jiashuo Liu, Peng Cui, Steven Z. Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose MC-Pseudolabel2, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization. The algorithm, with lightweight hyperparameters and optimization through a series of supervised regression steps, achieves superior performance on real-world datasets with distribution shift.
Researcher Affiliation	Academia	Jiayun Wu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 wujy22@mails.tsinghua.edu.cn Jiashuo Liu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 liujiashuo77@gmail.com Peng Cui Key Laboratory of Pervasive Computing, Ministry of Education Depart. of Computer Science & Tech., Tsinghua University Beijing, China 100084 cuip@tsinghua.edu.cn Zhiwei Steven Wu School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 zhiweiw@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 MC-Pseudo Label Require: A dataset D = (Dx, Dy), a grouping function class H, a predictive function class F. 1: t 0; 2: f0 Initialization; {For example, models trained with ERM.} 3: m \|Range(Discretize(f0))\|; 4: repeat...
Open Source Code	Yes	Code available at: https://github.com/IC-hub/MC-Pseudolabel
Open Datasets	Yes	We experiment on Poverty Map [44] and ACSIncome [7] for the multi-environment setting, and Vessel Power [33] for the single-environment setting.
Dataset Splits	Yes	We select the best model across hyperparameters based on three model selection criteria, including in-distribution validation on the average of training data, worst-environment validation with the worst performance across training environments, and oracle validation on target data.
Hardware Specification	Yes	Each experiment with a single set of hyperparameters is run on one NVIDIA Ge Force RTX 3090 with 24GB of memory, taking at most 15 minutes.
Software Dependencies	No	Our experiments are based on the architecture of Py Torch [35].
Experiment Setup	Yes	Table 4: Hyperparameters for model architecture.