Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Authors: Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance. In Section 4, we empirically show that DFPV outperforms other PCL methods in several examples. We further apply PCL methods to the off-policy evaluation problem in a confounded bandit setting, which aims to estimate the average reward of a new policy given data with confounding bias. We discuss the setting in Section 3, and show the superiority of DFPV in experiments in Section 4.
Researcher Affiliation Academia Liyuan Xu Gatsby Unit liyuan.jo.19@ucl.ac.uk
Pseudocode Yes Algorithm 1 Deep Feature Proxy Causal Learning
Open Source Code Yes The code is included in the supplemental material.
Open Datasets Yes Our second structural function estimation experiment considers high-dimensional treatment variables. We test this using the d Sprite dataset [19], which is an image dataset described by five latent parameters (shape, scale, rotation, pos X and pos Y). [19] L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. d Sprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/.
Dataset Splits Yes If observations of (A,Y,Z,W) are given for both stages, we can evaluate the out-of-sample loss of Stage 1 using Stage 2 data and vice versa, and these losses can be used for hyper-parameter tuning of λ1,λ2 (Appendix A). ... We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. ... We evenly split the data for Stage 1, Stage 2, and policy evaluation (i.e we set n = m = n ).
Hardware Specification Yes All experiments can be run in a few minutes on Intel(R) Xeon(R) CPU E5-2698 v4 2.20GHz.
Software Dependencies No The experiments are implemented using Py Torch [27]. However, no specific version number for PyTorch or any other software dependency is provided in the paper.
Experiment Setup Yes We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. We ran 20 simulations for each setting. Input: ... Regularization parameters (λ1,λ2). Learning rate α.