Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models
Authors: Andrew Jesson, Sören Mindermann, Uri Shalit, Yarin Gal
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show empirical evidence for the following claims: that our uncertainty aware methods are robust both to violations of the overlap assumption and a failure mode of propensity based trimming (6.1); that they indicate high uncertainty when covariate shifts occur between training and test distributions (6.2); and that they yield lower CATE estimation errors while rejecting fewer points than propensity based trimming (6.2). |
| Researcher Affiliation | Academia | Andrew Jesson Department of Computer Science University of Oxford Oxford, UK OX1 3QD andrew.jesson@cs.ox.ac.uk Sören Mindermann Department of Computer Science University of Oxford Oxford, UK OX1 3QD soren.mindermann@cs.ox.ac.uk Uri Shalit Technion Haifa, Israel 3200003 urishalit@technion.ac.il Yarin Gal Department of Computer Science University of Oxford Oxford, UK OX1 3QD yarin.gal@cs.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The preceding results can be reproduced using publicly available code3. 3Available at: https://github.com/OATML/ucate |
| Open Datasets | Yes | We introduce a new, high-dimensional, individual-level causal effect prediction benchmark dataset called CEMNIST to demonstrate robustness to overlap and propensity failure (6.1). Finally, we introduce a modification to the IHDP causal inference benchmark to explore covariate shift (6.2). We report results for the unaltered IHDP dataset in figure 3a and the l.h.s. of table 2. This supports that uncertainty rejection is more data-efficient, i.e., errors are lower while rejecting less. This is further supported by the results on ACIC 2016 [11] (figure 3c and the r.h.s. of table 2). |
| Dataset Splits | Yes | We generated 1000 datasets following the original IHDP experimental setup, each with 747 data points randomly split into training (600) and test (147) sets, each of which contains 561 treated and 186 control patients. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU/GPU models, memory, or specific computing environments used for experiments. |
| Software Dependencies | No | The paper mentions general software like 'TensorFlow' (in references) and 'Python' (for data generation) but does not provide specific version numbers for key software components or libraries used in their experimental setup. |
| Experiment Setup | Yes | MC Dropout is a simple change to existing methods. Gal & Ghahramramani [15] showed that we can simply add dropout [52] with L2 regularization in each of ω0, ω1 during training and then sample from the same dropout distribution at test time to get samples from q(ω0, ω1|D). With tuning of the dropout probability, this is equivalent to sampling from a Bernoulli approximate posterior q(ω0, ω1|D) (with standard Gaussian prior). MC Dropout has been used in various applications [60, 38, 28]. ... Appendix B.1 describes all models. All models were trained for 100 epochs using the Adam optimizer with default parameters. |