reproducibilityindex.ai

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Authors: Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance. In Section 4, we empirically show that DFPV outperforms other PCL methods in several examples. We further apply PCL methods to the off-policy evaluation problem in a confounded bandit setting, which aims to estimate the average reward of a new policy given data with confounding bias. We discuss the setting in Section 3, and show the superiority of DFPV in experiments in Section 4.
Researcher Affiliation	Academia	Liyuan Xu Gatsby Unit liyuan.jo.19@ucl.ac.uk
Pseudocode	Yes	Algorithm 1 Deep Feature Proxy Causal Learning
Open Source Code	Yes	The code is included in the supplemental material.
Open Datasets	Yes	Our second structural function estimation experiment considers high-dimensional treatment variables. We test this using the d Sprite dataset [19], which is an image dataset described by ﬁve latent parameters (shape, scale, rotation, pos X and pos Y). [19] L. Matthey, I. Higgins, D. Hassabis, and A. Lerchner. d Sprites: Disentanglement testing sprites dataset, 2017. URL https://github.com/deepmind/dsprites-dataset/.
Dataset Splits	Yes	If observations of (A,Y,Z,W) are given for both stages, we can evaluate the out-of-sample loss of Stage 1 using Stage 2 data and vice versa, and these losses can be used for hyper-parameter tuning of λ1,λ2 (Appendix A). ... We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. ... We evenly split the data for Stage 1, Stage 2, and policy evaluation (i.e we set n = m = n ).
Hardware Specification	Yes	All experiments can be run in a few minutes on Intel(R) Xeon(R) CPU E5-2698 v4 2.20GHz.
Software Dependencies	No	The experiments are implemented using Py Torch [27]. However, no specific version number for PyTorch or any other software dependency is provided in the paper.
Experiment Setup	Yes	We tuned the regularizers λ1,λ2 as discussed in Appendix A, with the data evenly split for Stage 1 and Stage 2. We ran 20 simulations for each setting. Input: ... Regularization parameters (λ1,λ2). Learning rate α.