Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
Authors: Liyuan Xu, Heishiro Kanagawa, Arthur Gretton
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance. In Section 4, we empirically show that DFPV outperforms other PCL methods in several examples. We further apply PCL methods to the off-policy evaluation problem in a confounded bandit setting, which aims to estimate the average reward of a new policy given data with confounding bias. We discuss the setting in Section 3, and show the superiority of DFPV in experiments in Section 4. |
| Researcher Affiliation | Academia | Liyuan Xu Gatsby Unit liyuan.jo.19@ucl.ac.uk |
| Pseudocode | Yes | Algorithm 1 Deep Feature Proxy Causal Learning |
| Open Source Code | Yes | The code is included in the supplemental material. |
| Open Datasets | Yes | Our second structural function estimation experiment considers high-dimensional treatment variables. We test this using the d Sprite dataset [19], which is an image dataset described by five latent parameters (shape, scale, rotation, pos X and pos Y). [19] L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. d Sprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/. |
| Dataset Splits | Yes | If observations of (A,Y,Z,W) are given for both stages, we can evaluate the out-of-sample loss of Stage 1 using Stage 2 data and vice versa, and these losses can be used for hyper-parameter tuning of λ1,λ2 (Appendix A). ... We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. ... We evenly split the data for Stage 1, Stage 2, and policy evaluation (i.e we set n = m = n ). |
| Hardware Specification | Yes | All experiments can be run in a few minutes on Intel(R) Xeon(R) CPU E5-2698 v4 2.20GHz. |
| Software Dependencies | No | The experiments are implemented using Py Torch [27]. However, no specific version number for PyTorch or any other software dependency is provided in the paper. |
| Experiment Setup | Yes | We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. We ran 20 simulations for each setting. Input: ... Regularization parameters (λ1,λ2). Learning rate α. |