SpaCE: The Spatial Confounding Environment

Authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Mirabai Woodward, Jie Kate Hu, Naeem Khoshnevis, Francesca Dominici

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To address this problem, we introduce Spa CE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. Spa CE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The Spa CE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.
Researcher Affiliation Academia Mauricio Tec, Ana Trisovic, Michelle Audirac {mauriciogtec,trisovic,maudirac,khu}@hsph.harvard.edu Department of Biostatistics Harvard University Sophie Woodward, Naeem Khoshnevis {swoodward, nkhoshnevis}@fas.harvard.edu Department of Biostatistics Harvard University Francesca Dominici fdominic@hsph.harvard.edu Department of Biostatistics Harvard University
Pseudocode Yes Algorithm 1 Spatially-aware validation split selection
Open Source Code Yes The Spa CE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions. 1The Spa CE source code is available at https://anonymous.4open.science/r/space-BC93
Open Datasets Yes Spa CE datasets comprise real treatment and confounder data from publicly available sources commonly used in environmental health, social science, economics, and climate science studies, among other domains. ... Table 8: Major existing reused data resources
Dataset Splits Yes We found it critical to implement a spatially-aware train-validation data split (Roberts et al., 2017)... This algorithm is described in Algorithm 1 in the supplement. ... Using the default parameters specified in Algorithm 1, we consistently obtain training splits of size 50% 70% and validations splits of size 10% 20%.
Hardware Specification Yes The computations in this paper were run on a Mac OS M1 with ten cores in approximately 24 hours. most of the computation driven by training the graph neural network benchmarks.
Software Dependencies Yes We fit the ensemble using the Auto Gluon Python package (Erickson et al., 2020) ... Table 5: Hyperparameters used in Auto ML, package Auto Gluon v0.7.0
Experiment Setup Yes Table 5 describes the default settings used for Autogluon. ... Table 6 summarizes our hyperparameter search space for different baseline models.