Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization

Authors: Yongqiang Chen, Kaiwen Zhou, Yatao Bian, Binghui Xie, Bingzhe Wu, Yonggang Zhang, MA KAILI, Han Yang, Peilin Zhao, Bo Han, James Cheng

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on challenging benchmarks, WILDS, show that PAIR alleviates the compromises and yields top OOD performances.
Researcher Affiliation Collaboration Yongqiang Chen1 , Kaiwen Zhou1, Yatao Bian2, Binghui Xie1, Bingzhe Wu2 1The Chinese University of Hong Kong 2Tencent AI Lab 3Hong Kong Baptist University EMAIL Yonggang Zhang3, Han Yang1, Kaili Ma1, Peilin Zhao2, Bo Han3, James Cheng1 EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1 Pseudo code for PAIR-o.
Open Source Code Yes 1Code is available at https://github.com/LFhase/PAIR.
Open Datasets Yes We select 6 challenging datasets from WILDS (Koh et al., 2021) benchmark for evaluating PAIR-o performance in realistic distribution shifts. The datasets cover from domain distribution shifts, subpopulation shifts and the their mixed. A summary of the basic information and statistics of the WILDS datasets can be found in Table. 8, Table. 9, respectively.
Dataset Splits Yes By default, we repeat the experiments by 3 runs with the random seeds of 0, 1, 2. While for Camelyon17, we follow the official guide to repeat 10 times with the random seeds from 0 to 9, and for Poverty Map, we repeat the experiments 5 times with the random seeds from 0 to 4. Specifically, to construct the validation set, the data from each domain will be first splitted into 80% (for training and evaluation) and 20% (for validation and model selection).
Hardware Specification Yes Specifically, we run COLOREDMNIST experiments on Linux Servers with NVIDIA RTX 3090Ti graphics cards with CUDA 11.3, 40 cores Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 256 GB Memory, and Ubuntu 18.04 LTS installed. While for WILDS and DOMAINBED experiments, we run on Linux servers with NVIDIA V100 graphics cards with CUDA 10.2.
Software Dependencies Yes We implement our methods with Py Torch (Paszke et al., 2019). For the software and hardware configurations, we ensure the consistent environments for each datasets. Specifically, we run COLOREDMNIST experiments on Linux Servers with NVIDIA RTX 3090Ti graphics cards with CUDA 11.3... While for WILDS and DOMAINBED experiments, we run on Linux servers with NVIDIA V100 graphics cards with CUDA 10.2.
Experiment Setup Yes The general hyperparemter setting inherit from the referred codes and papers, and are shown as in Table 11. Table 11: General hyperparameter settings for the experiments on WILDS. (Includes Learning rate, Weight decay, Batch size, Optimizer, Pretraing Step, Maximum Epoch)