reproducibilityindex.ai

Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

Authors: Kartik Ahuja, Jun Wang, Amit Dhurandhar, Karthikeyan Shanmugam, Kush R. Varshney

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we discuss classiﬁcation experiments (regression experiments with similar qualitative ﬁndings are in the supplement). We use the ﬁrst two environments (e = 1, 2) to train and third environment (e = 3) to test. Other details of the training (models, hyperparameters, etc.) are in the supplement. For each of the above datasets, we run the experiments for different amounts of training data from 1000 to up to 60000 samples (10 trials for each data size).
Researcher Affiliation	Collaboration	Kartik Ahuja , Jun Wang , Amit Dhurandhar , Karthikeyan Shanmugam , Kush R. Varshney IBM Research, TJ Watson Research Center, NY, Rensselaer Polytechnique Institute
Pseudocode	No	The paper describes procedures and mathematical formulations but does not include any explicitly labeled 'Algorithm' or 'Pseudocode' blocks.
Open Source Code	Yes	The code to reproduce the results presented in this work can be found at https://github.com/IBM/OoD.
Open Datasets	Yes	Arjovsky et al. (2019) proposed the colored MNIST (CMNIST) dataset; comparisons on it showed how ERM-based models exploit spurious factors (background color).
Dataset Splits	Yes	We use the train domain validation set procedure described in Gulrajani & Lopez-Paz (2020) to select the penalty value from the set {1e4, 3.3e4, 6.6e4, 1e5} (with 4:1 train-validation split).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions tools like 'sklearn' for regression and 'Re LU' activation, but it does not provide specific version numbers for any software libraries, frameworks, or environments.
Experiment Setup	Yes	We use a learning rate of 4.9e-4, batch size of 512 for both ERM and IRM. We use 1000 gradient steps for IRM. As was done in the original IRM work (Arjovsky et al., 2019), we use a threshold on steps (190) after which a large penalty is imposed for violating the IRM constraint.