On Inductive Biases for Heterogeneous Treatment Effect Estimation

Authors: Alicia Curth, Mihaela van der Schaar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement instantiations of all approaches using NNs and evaluate their performance across a wide range of semi-synthetic experiments. We empirically confirm that all approaches can improve upon baselines, including both end-to-end and multi-stage approaches, and present a number of insights into the relative strengths of each approach.
Researcher Affiliation Academia Alicia Curth University of Cambridge amc253@cam.ac.uk Mihaela van der Schaar University of Cambridge University of California, Los Angeles The Alan Turing Institute mv472@cam.ac.uk
Pseudocode Yes refer to Appendix B.3 for pseudocode of a Flex TENet forward pass.
Open Source Code Yes Code to replicate all experiments is available at https://github.com/Alicia Curth/CATENets
Open Datasets Yes For setups A&B, we use the ACIC2016 covariates (n = 4802, d = 55) of [45] but design our own response surfaces... For setups C&D, we use the IHDP benchmark (n = 747, d = 25), into which [1] introduced confounding, imbalance (18% treated) and incomplete overlap.
Dataset Splits No The paper mentions '90/10 train-test splits' for the IHDP benchmark, but does not explicitly detail a separate validation split or its proportions for reproducibility for all experiments. For setups A&B, it states using '500 units for testing' and varying n0 and n1 for control/treatment, but does not specify a validation set.
Hardware Specification No The paper does not specify the hardware used for running the experiments.
Software Dependencies No The paper mentions implementing models using 'neural networks (NNs)' and Appendix B.3 provides 'pseudocode' which implies software, but it does not specify any software dependencies with version numbers (e.g., 'PyTorch 1.x', 'Python 3.x').
Experiment Setup Yes We implement all neural network models in PyTorch. For all models, we use the Adam optimizer with a learning rate of 1e-3. We train for 100 epochs with a batch size of 256. Early stopping is applied with a patience of 10 epochs. We use ELU activation functions throughout the network. The hidden layers of the representation Φ and regression heads h w consist of 3 and 2 layers, respectively.