Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Efficient CVaR RL in Low-rank MDPs

Authors: Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that our algorithm achieves a sample complexity of O H7A2d4 τ 2ϵ2 to yield an ϵ-optimal CVa R, where H is the length of each episode, A is the capacity of action space, and d is the dimension of representations. Computational-wise, we design a novel discretized Least-Squares Value Iteration (LSVI) algorithm for the CVa R objective as the planning oracle and show that we can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle. To our knowledge, this is the first provably efficient CVa R RL algorithm in low-rank MDPs.
Researcher Affiliation Academia Yulai Zhao Princeton University EMAIL Wenhao Zhan Princeton University EMAIL Xiaoyan Hu The Chinese University of Hong Kong EMAIL Ho-fung Leung Independent Researcher EMAIL Farzan Farnia The Chinese University of Hong Kong EMAIL Wen Sun Cornell University EMAIL Jason D. Lee Princeton University EMAIL
Pseudocode Yes Algorithm 1 ELA and Algorithm 3 ELLA are provided as structured pseudocode blocks.
Open Source Code No The paper is theoretical and does not mention releasing open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not specify the use of any publicly available datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not specify training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not provide specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe a concrete experimental setup with hyperparameter values or training configurations.