Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

Authors: Kartik Ahuja, Jun Wang, Amit Dhurandhar, Karthikeyan Shanmugam, Kush R. Varshney

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we discuss classification experiments (regression experiments with similar qualitative findings are in the supplement). We use the first two environments (e = 1, 2) to train and third environment (e = 3) to test. Other details of the training (models, hyperparameters, etc.) are in the supplement. For each of the above datasets, we run the experiments for different amounts of training data from 1000 to up to 60000 samples (10 trials for each data size).
Researcher Affiliation Collaboration Kartik Ahuja , Jun Wang , Amit Dhurandhar , Karthikeyan Shanmugam , Kush R. Varshney IBM Research, TJ Watson Research Center, NY, Rensselaer Polytechnique Institute
Pseudocode No The paper describes procedures and mathematical formulations but does not include any explicitly labeled 'Algorithm' or 'Pseudocode' blocks.
Open Source Code Yes The code to reproduce the results presented in this work can be found at https://github.com/IBM/OoD.
Open Datasets Yes Arjovsky et al. (2019) proposed the colored MNIST (CMNIST) dataset; comparisons on it showed how ERM-based models exploit spurious factors (background color).
Dataset Splits Yes We use the train domain validation set procedure described in Gulrajani & Lopez-Paz (2020) to select the penalty value from the set {1e4, 3.3e4, 6.6e4, 1e5} (with 4:1 train-validation split).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions tools like 'sklearn' for regression and 'Re LU' activation, but it does not provide specific version numbers for any software libraries, frameworks, or environments.
Experiment Setup Yes We use a learning rate of 4.9e-4, batch size of 512 for both ERM and IRM. We use 1000 gradient steps for IRM. As was done in the original IRM work (Arjovsky et al., 2019), we use a threshold on steps (190) after which a large penalty is imposed for violating the IRM constraint.