reproducibilityindex.ai

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Authors: Ming Yin, Yu Bai, Yu-Xiang Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we propose Off-Policy Double Variance Reduction (OPDVR), a new variance reduction based algorithm for ofﬂine RL. Our main result shows that OPDVR provably identiﬁes an ϵ-optimal policy with e O(H2/dmϵ2) episodes of ofﬂine data in the ﬁnite-horizon stationary transition setting... Moreover, we establish an informationtheoretic lower bound of Ω(H2/dmϵ2) which certiﬁes that OPDVR is optimal up to logarithmic factors.
Researcher Affiliation	Collaboration	Ming Yin 1,3, Yu Bai2, and Yu-Xiang Wang1 1Department of Computer Science, UC Santa Barbara 2Salesforce Research 3Department of Statistics and Applied Probability, UC Santa Barbara
Pseudocode	Yes	Algorithm 1 OPVRT: A Prototypical Off-Policy Variance Reduction Template; Algorithm 2 (OPDVR) Off-Policy Doubled Variance Reduction
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper refers to using a 'static ofﬂine dataset D' obtained by executing a 'pre-speciﬁed behavior policy µ', but does not name a publicly available dataset or provide any access information (link, DOI, specific citation with authors/year) for a dataset used for training.
Dataset Splits	No	The paper does not provide specific information regarding training, validation, or test dataset splits. It is a theoretical paper focusing on algorithms and sample complexity.
Hardware Specification	No	The paper does not mention any specific hardware used for running experiments. It is a theoretical paper.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers. It focuses on theoretical algorithms and proofs.
Experiment Setup	No	The paper is theoretical and does not provide details about an experimental setup, such as hyperparameters or specific training configurations.