Reinforcement Learning for Cost-Aware Markov Decision Processes

Authors: Wesley Suttle, Kaiqing Zhang, Zhuoran Yang, Ji Liu, David Kraemer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical experiments that illustrate the convergence results obtained in the preceding. In addition to providing strong support for our theory, our simulations suggest both CARVI Q-learning and CAAC enjoy promising performance and merit further study.
Researcher Affiliation Academia 1Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA. 2Department of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois, USA. 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA. 4Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, New York, USA.
Pseudocode Yes Algorithm 1 CARVI Q-learning
Open Source Code No The paper discusses external tools and environments (Open AI Gym) and references supplementary materials for empirical results, but it does not include a statement about releasing its own source code for the proposed algorithms, nor does it provide a link.
Open Datasets No We considered two different sizes of CAMDP for our experiments: |S| = |A| = 5 and |S| = |A| = 10. ... For each size and reward/cost combination, we randomly generated a transition kernel P(|s, a) that satisfies Assumption 3, completing the specification of the corresponding CAMDP. ... Deep CARVI Q-learning. For this paper we also implemented a version of CARVI Q-learning using neural networks for the Q function approximators and tested it on a cost-aware modification of the classic Mountain Car control environment (Moore, 1990) provided by Open AI’s Gym RL testbed (Brockman et al., 2016).
Dataset Splits No The paper describes running 15 independent replications and plotting learning curves but does not provide specific details on train/validation/test dataset splits or cross-validation for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions 'Open AI’s Gym RL testbed' and refers to general software components like 'neural networks' and 'linear function approximators' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, etc.) for reproducibility.
Experiment Setup No The paper states: 'Hyperparameters were determined through experimentation and are included in the supplementary material.' This indicates the information is not present in the main text of the paper.