Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach
Authors: Ziliang Chen, Yongsen Zheng, Zhao-Rong Lai, Quanlong Guan, Liang Lin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments verified the superiority of our approach to fight against the fake invariant issue across a variety of OOD generalization benchmarks. In this section, we firstly conduct the diagnostic experiments on the benchmarks derived from recent studies (Arjovsky et al. 2019; Ahmed et al. 2020) broadly applied in IRL for OOD generalization. |
| Researcher Affiliation | Academia | Ziliang Chen1,2, Yongsen Zheng3, Zhao-Rong Lai1, Quanlong Guan1*, Liang Lin3 1Jinan University 2Pazhou Lab 3Sun Yat-sen University c.ziliang@yahoo.com, z.yongsensmile@gmail.com, {laizhr,Gql}@jnu.edu.cn, linliang@ieee.org |
| Pseudocode | No | The paper describes its methods using text and mathematical formulations but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that code is released or available. |
| Open Datasets | Yes | Benchmarks. The diagnostic study provides the forensic of IRL baselines under the RS-SCM Assumption. It requires the datasets generated by the same causal mechanism, however, existing diagnostic benchmarks are generated by either FIIF or PIIF SCM, hardly fullfiling our demand (Arjovsky et al. 2019; Ahmed et al. 2020). We observe that RS-SCM consists of FIIF and PIIF SCM Assumptions so that combine their generation recipes to build our diagnostic benchmark to evaluate the invariant learning quality. Specifically, we consider the ten-class digit classification mission derived from the CS-MNIST (FIIF) benchmark in (Ahuja et al. 2021). It consists of two environments for training with 20,000 samples each and one environment for evaluation with the same number of data. [...] In this case, we resemble the composition rule in CIFAR-MNIST (Zhou et al. 2022) [...] Beyond the diagnostic datasets, we also conduct the experiments on VLCS, PACS, Office-HOME, Terra-Incognita and Domain Net, which refer to Domain Bed (Gulrajani and Lopez-Paz 2020) for real-world OOD generalization. |
| Dataset Splits | Yes | 20% training data are split into the validation set for CS-MNIST-CIFAR and CS-MNIST-COCO, where all baselines are evaluated to produce their in-distribution (ID) performances. We employ the model selection strategy by leaveone-domain-out cross validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | No | The paper describes the overall training framework and some architectural choices, but it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, specific optimizer settings), number of epochs, or other system-level training configurations in the main text. |