Provably Invariant Learning without Domain Information
Authors: Xiaoyu Tan, Lin Yong, Shengyu Zhu, Chao Qu, Xihe Qiu, Xu Yinghui, Peng Cui, Yuan Qi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we perform method evaluation on both synthetic and real-world datasets. This evaluation will also experimentally tests our theoretical analysis discussed in Section 3.3. For baseline methods, we choose ERM to demonstrate the typical OOD performance under IID assumption, IRM (Arjovsky et al., 2019) and group DRO (Sagawa et al., 2019) with given ground-truth environment segmentation to acquire the best OOD performance. We choose HRM (Liu et al., 2021), EIIL (Creager et al., 2021), ZIN (Yong et al., 2022), and Lf F (Nam et al., 2020) to compare with our proposed method since these algorithms are also designed to perform invariance learning without providing environment partition. |
| Researcher Affiliation | Collaboration | Xiaoyu Tan * 1 Yong Lin * 2 Shengyu Zhu 3 Chao Qu 1 Xihe Qiu 4 Yinghui Xu 5 Peng Cui 6 Yuan Qi 5 *Equal contribution 1INF Technology (Shanghai) Co., Ltd. Shanghai, China 2Hong Kong University of Science and Technology, Hong Kong, China 3Ubiquant Investment, Beijing, China 4School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China 5Artificial Intelligence Innovation and Incubation (AI³) Institute, Fudan University, Shanghai, China 6Department of Computer Science and Technology, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 TIVA |
| Open Source Code | Yes | 1The code is released in Git Hub repository TIVA. |
| Open Datasets | Yes | 4.2.1. CELEBA DATASET We perform this experiment based on the open-source dataset Celeb A (Liu et al., 2018) which contains face images of celebrities. |
| Dataset Splits | No | No explicit details on a separate validation set split were found, only train and test splits were clearly defined for reproduction (e.g., "For train and test dataset separation, we use non-African locations as train set and African locations as test set to validate the OOD performance."). |
| Hardware Specification | Yes | We perform all the experiments on Nvidia V100 GPU. |
| Software Dependencies | No | The paper mentions optimizers like Adam (Kingma and Ba, 2014) and SGD (Bottou, 2012) and refers to "neural network architecture" or "linear layer" but does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | The part of the hyperparameter used by our proposed method in Section 4 is shown in Table 10. Here we represent the neural network architecture as a list, for example, the v we used across all the experiments is 2 layer MLP with 32 neurons, which is denoted as [32, 32]. Here, we use the Re LU activation function across all networks (Li and Yuan, 2017). For encoder uωu and qωq, the output layer is activated by tanh function (Xiao et al., 2005). These hyperparameters are finetuned by the grid search method in 5 trials. In Synthetic simulation, here we utilize one linear layer [16] as Φ with 1024 batch size. We train the model in 5000 epochs with 4500 epoch environment annealing. |