Do causal predictors generalize better to new domains?
Authors: Vivian Nastl, Moritz Hardt
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets...allowing us to test how well a model trained in one domain performs in another. |
| Researcher Affiliation | Academia | Vivian Y. Nastl Max Planck Institute for Intelligent Systems, Tübingen, Germany and Tübingen AI Center Max Planck ETH Center for Learning Systems vivian.nastl@tuebingen.mpg.de Moritz Hardt Max Planck Institute for Intelligent Systems, Tübingen, Germany and Tübingen AI Center hardt@is.mpg.de |
| Pseudocode | No | The paper describes experimental procedures and methods in paragraph text and figures, but it does not include formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our code is based on Gardner et al. [2023], Hardt and Kim [2023] and Gulrajani and Lopez-Paz [2020]. It is available at https://github.com/socialfoundations/causal-features. |
| Open Datasets | Yes | We consider 16 prediction tasks on tabular datasets from prior work [Ding et al., 2021, Hardt and Kim, 2023, Gardner et al., 2023]...Table 1: Description of tasks, data sources and number of features in each selection. |
| Dataset Splits | Yes | We have a train/test/validation split within the in-domain set, and a test/validation split within the out-of-domain set. |
| Hardware Specification | Yes | Each job was given the same computing resources: 1 CPU. Compute nodes use AMD EPYC 7662 64-core CPUs. Memory was allocated as required for each task: all jobs were allocated at least 128GB of RAM; for the tasks Public Coverage jobs were allocated 384GB of RAM. |
| Software Dependencies | No | The paper mentions several software components and libraries, such as 'Hyper Opt [Bergstra et al., 2013]' and machine learning algorithms (XGBoost, Light GBM, IRM, REx, etc.), but it does not specify their version numbers. |
| Experiment Setup | Yes | We conduct a hyperparameter sweep using Hyper Opt [Bergstra et al., 2013] on the in-domain validation data. A method is tuned for 50 trials. We exclusively train on the training set. |