reproducibilityindex.ai

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Authors: Fan Chen, Junyu Zhang, Zaiwen Wen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an O((1 γ) 1) factor. We propose the DPDL algorithm to solve the CMDP problem (1). ... DPDL provably finds an ϵ-optimal policy with zero constraint violation using O min{\|S\|\|A\|,\|S\|+I}C (1 γ)4ϵ2 offline samples. We establish an information theoretic sample complexity lower bound of Ω min{\|S\|\|A\|,\|S\|+I}C for the offline CMDPs, indicating that DPDL is near optimal up to an O((1 γ) 1) factor.
Researcher Affiliation	Academia	Fan Chen School of Mathematics Peking University chern@pku.edu.cn Junyu Zhang Department of Industrial Systems Engineering and Management National University of Singapore junyuz@nus.edu.sg Zaiwen Wen Beijing International Center for Mathematical Research, Center For Machine Learning Research Peking University wenzw@pku.edu.cn
Pseudocode	Yes	Algorithm 1: Deviation-controlled Primal-Dual Learning algorithm (DPDL)
Open Source Code	No	The paper does not contain any statements about making its source code openly available or links to a code repository.
Open Datasets	No	The paper refers to using "offline data" or "batch dataset D" but does not provide any concrete access information (link, DOI, specific citation with authors/year, or mention of established benchmark datasets) for a publicly available dataset.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and theoretical sample complexity. It does not provide any specific information about dataset splits (e.g., percentages, sample counts, or splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware details (GPU/CPU models, memory, etc.) used for running experiments.
Software Dependencies	No	The paper is theoretical and does not provide specific software names with version numbers that would be needed to replicate an experiment.
Experiment Setup	No	The paper specifies constants and parameters for its DPDL algorithm within the context of theoretical analysis (e.g., "Suppose that Algorithm 1 runs with ηt 1 T , κ = 5φϵ, αλ = 1 1 γ q ψ log I , αV = ψ \|S\|, αx = 1 φ(1 γ) q Nψ log ψ"). However, it does not describe an "experimental setup" with hyperparameters for empirical evaluation.