reproducibilityindex.ai

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Authors: Lingxiao Wang, Zhuoran Yang, Zhaoran Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	To tackle such challenges, we propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efﬁcient manner. DOVI explicitly adjusts for the confounding bias in the observational data, where the confounders are partially observed or unobserved. In both cases, such adjustments allow us to construct the bonus based on a notion of information gain, which takes into account the amount of information acquired from the ofﬂine setting. In particular, we prove that the regret of DOVI is smaller than the optimal regret achievable in the pure online setting when the confounded observational data are informative upon the adjustments.
Researcher Affiliation	Academia	Lingxiao Wang Northwestern University lwang@u.northwestern.edu Zhuoran Yang Princeton University zy6@princeton.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com
Pseudocode	Yes	Algorithm 1 Deconfounded Optimistic Value Iteration (DOVI) for Confounded MDP
Open Source Code	No	The paper does not provide any information about open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not specify or provide access information for a publicly available or open dataset for training purposes.
Dataset Splits	No	The paper is theoretical and does not specify training/validation/test dataset splits.
Hardware Specification	No	The paper is theoretical and does not provide hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup or specific hyperparameters.