SpaCE: The Spatial Confounding Environment
Authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Mirabai Woodward, Jie Kate Hu, Naeem Khoshnevis, Francesca Dominici
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To address this problem, we introduce Spa CE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. Spa CE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The Spa CE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions. |
| Researcher Affiliation | Academia | Mauricio Tec, Ana Trisovic, Michelle Audirac {mauriciogtec,trisovic,maudirac,khu}@hsph.harvard.edu Department of Biostatistics Harvard University Sophie Woodward, Naeem Khoshnevis {swoodward, nkhoshnevis}@fas.harvard.edu Department of Biostatistics Harvard University Francesca Dominici fdominic@hsph.harvard.edu Department of Biostatistics Harvard University |
| Pseudocode | Yes | Algorithm 1 Spatially-aware validation split selection |
| Open Source Code | Yes | The Spa CE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions. 1The Spa CE source code is available at https://anonymous.4open.science/r/space-BC93 |
| Open Datasets | Yes | Spa CE datasets comprise real treatment and confounder data from publicly available sources commonly used in environmental health, social science, economics, and climate science studies, among other domains. ... Table 8: Major existing reused data resources |
| Dataset Splits | Yes | We found it critical to implement a spatially-aware train-validation data split (Roberts et al., 2017)... This algorithm is described in Algorithm 1 in the supplement. ... Using the default parameters specified in Algorithm 1, we consistently obtain training splits of size 50% 70% and validations splits of size 10% 20%. |
| Hardware Specification | Yes | The computations in this paper were run on a Mac OS M1 with ten cores in approximately 24 hours. most of the computation driven by training the graph neural network benchmarks. |
| Software Dependencies | Yes | We fit the ensemble using the Auto Gluon Python package (Erickson et al., 2020) ... Table 5: Hyperparameters used in Auto ML, package Auto Gluon v0.7.0 |
| Experiment Setup | Yes | Table 5 describes the default settings used for Autogluon. ... Table 6 summarizes our hyperparameter search space for different baseline models. |