Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration
Authors: Kotaro Yoshida, Hiroki Naganuma
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a comparative analysis of datasets with distributional shifts, we observe that Information Bottleneck-based IRM achieves consistent calibration across different environments. This observation suggests that information compression techniques, such as Information Bottleneck are potentially effective in achieving model invariance. Furthermore, our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated. This demonstrates that invariance and cross-environment calibration are empirically equivalent. |
| Researcher Affiliation | Academia | Yoshida Kotaro1 , Hiroki Naganuma2,3 EMAIL, EMAIL, 1Tokyo Institute of Technology, 2Mila Quebec AI Institute, 3Université de Montréal, |
| Pseudocode | No | The paper describes methods and optimizations using mathematical formulations and descriptive text, but it does not contain any clearly labeled pseudocode blocks or algorithms formatted like code. |
| Open Source Code | Yes | Our code is available at https://github.com/katoro8989/IRM_Variants_Calibration |
| Open Datasets | Yes | Specifically, the datasets used were Colored MNIST (CMNIST) (Arjovsky et al., 2020), Rotated MNIST (RMNIST) (Ghifary et al., 2015), PACS (Li et al., 2017), and VLCS (Fang et al., 2013), sourced from the Domain Bed benchmark (Gulrajani & Lopez-Paz, 2020). |
| Dataset Splits | Yes | We split each dataset into training and validation sets, with 80% used for training and the remaining 20% for validation. The environment partitions were as follows: CMNIST: Etrain = [10%, 20%], Etest = [90%] RMNIST: Etrain = [15 , 30 , 45 , 60 , 75 ], Etest = [0 ] PACS: Etrain = [Photo, Painting, Sketch], Etest = [Art] VLCS: Etrain = [Caltech101, Label Me, SUN09], Etest = [V OC2007] |
| Hardware Specification | Yes | We acknowledge the generous allocation of computational resources from the TSUBAME3.0 supercomputer facilitated by the Tokyo Institute of Technology. |
| Software Dependencies | No | For optimization, Adam (Kingma & Ba, 2015) was used consistently across all models, and the tuning of learning rates and hyperparameters for each method was conducted in accordance with their respective papers. No specific version numbers for Adam or other software dependencies are provided. |
| Experiment Setup | Yes | In the experiments, we set the batch size to 256 for CMNIST, 128 for RMNIST, and 16 for PACS and VLCS. Grid search was performed on the learning rate for all experiments, with values of [1e-4, 5e-4, 1e-3, 5e-3]. For the hyperparameters specific to each approximation method, grid search was conducted as shown in Table 3. |