A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Authors: Fan Chen, Junyu Zhang, Zaiwen Wen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an O((1 γ) 1) factor. We propose the DPDL algorithm to solve the CMDP problem (1). ... DPDL provably finds an ϵ-optimal policy with zero constraint violation using O min{|S||A|,|S|+I}C (1 γ)4ϵ2 offline samples. We establish an information theoretic sample complexity lower bound of Ω min{|S||A|,|S|+I}C for the offline CMDPs, indicating that DPDL is near optimal up to an O((1 γ) 1) factor.
Researcher Affiliation Academia Fan Chen School of Mathematics Peking University chern@pku.edu.cn Junyu Zhang Department of Industrial Systems Engineering and Management National University of Singapore junyuz@nus.edu.sg Zaiwen Wen Beijing International Center for Mathematical Research, Center For Machine Learning Research Peking University wenzw@pku.edu.cn
Pseudocode Yes Algorithm 1: Deviation-controlled Primal-Dual Learning algorithm (DPDL)
Open Source Code No The paper does not contain any statements about making its source code openly available or links to a code repository.
Open Datasets No The paper refers to using "offline data" or "batch dataset D" but does not provide any concrete access information (link, DOI, specific citation with authors/year, or mention of established benchmark datasets) for a publicly available dataset.
Dataset Splits No The paper is theoretical and focuses on algorithm design and theoretical sample complexity. It does not provide any specific information about dataset splits (e.g., percentages, sample counts, or splitting methodology) for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not mention any specific hardware details (GPU/CPU models, memory, etc.) used for running experiments.
Software Dependencies No The paper is theoretical and does not provide specific software names with version numbers that would be needed to replicate an experiment.
Experiment Setup No The paper specifies constants and parameters for its DPDL algorithm within the context of theoretical analysis (e.g., "Suppose that Algorithm 1 runs with ηt 1 T , κ = 5φϵ, αλ = 1 1 γ q ψ log I , αV = ψ |S|, αx = 1 φ(1 γ) q Nψ log ψ"). However, it does not describe an "experimental setup" with hyperparameters for empirical evaluation.