Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift
Authors: Jiayun Wu, Jiashuo Liu, Peng Cui, Steven Z. Wu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose MC-Pseudolabel2, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization. The algorithm, with lightweight hyperparameters and optimization through a series of supervised regression steps, achieves superior performance on real-world datasets with distribution shift. |
| Researcher Affiliation | Academia | Jiayun Wu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 wujy22@mails.tsinghua.edu.cn Jiashuo Liu Depart. of Computer Science & Tech. Tsinghua University Beijing, China 100084 liujiashuo77@gmail.com Peng Cui Key Laboratory of Pervasive Computing, Ministry of Education Depart. of Computer Science & Tech., Tsinghua University Beijing, China 100084 cuip@tsinghua.edu.cn Zhiwei Steven Wu School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 zhiweiw@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 MC-Pseudo Label Require: A dataset D = (Dx, Dy), a grouping function class H, a predictive function class F. 1: t 0; 2: f0 Initialization; {For example, models trained with ERM.} 3: m |Range(Discretize(f0))|; 4: repeat... |
| Open Source Code | Yes | Code available at: https://github.com/IC-hub/MC-Pseudolabel |
| Open Datasets | Yes | We experiment on Poverty Map [44] and ACSIncome [7] for the multi-environment setting, and Vessel Power [33] for the single-environment setting. |
| Dataset Splits | Yes | We select the best model across hyperparameters based on three model selection criteria, including in-distribution validation on the average of training data, worst-environment validation with the worst performance across training environments, and oracle validation on target data. |
| Hardware Specification | Yes | Each experiment with a single set of hyperparameters is run on one NVIDIA Ge Force RTX 3090 with 24GB of memory, taking at most 15 minutes. |
| Software Dependencies | No | Our experiments are based on the architecture of Py Torch [35]. |
| Experiment Setup | Yes | Table 4: Hyperparameters for model architecture. |