Empirical or Invariant Risk Minimization? A Sample Complexity Perspective
Authors: Kartik Ahuja, Jun Wang, Amit Dhurandhar, Karthikeyan Shanmugam, Kush R. Varshney
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we discuss classification experiments (regression experiments with similar qualitative findings are in the supplement). We use the first two environments (e = 1, 2) to train and third environment (e = 3) to test. Other details of the training (models, hyperparameters, etc.) are in the supplement. For each of the above datasets, we run the experiments for different amounts of training data from 1000 to up to 60000 samples (10 trials for each data size). |
| Researcher Affiliation | Collaboration | Kartik Ahuja , Jun Wang , Amit Dhurandhar , Karthikeyan Shanmugam , Kush R. Varshney IBM Research, TJ Watson Research Center, NY, Rensselaer Polytechnique Institute |
| Pseudocode | No | The paper describes procedures and mathematical formulations but does not include any explicitly labeled 'Algorithm' or 'Pseudocode' blocks. |
| Open Source Code | Yes | The code to reproduce the results presented in this work can be found at https://github.com/IBM/OoD. |
| Open Datasets | Yes | Arjovsky et al. (2019) proposed the colored MNIST (CMNIST) dataset; comparisons on it showed how ERM-based models exploit spurious factors (background color). |
| Dataset Splits | Yes | We use the train domain validation set procedure described in Gulrajani & Lopez-Paz (2020) to select the penalty value from the set {1e4, 3.3e4, 6.6e4, 1e5} (with 4:1 train-validation split). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions tools like 'sklearn' for regression and 'Re LU' activation, but it does not provide specific version numbers for any software libraries, frameworks, or environments. |
| Experiment Setup | Yes | We use a learning rate of 4.9e-4, batch size of 512 for both ERM and IRM. We use 1000 gradient steps for IRM. As was done in the original IRM work (Arjovsky et al., 2019), we use a threshold on steps (190) after which a large penalty is imposed for violating the IRM constraint. |