Policy Optimization for Continuous Reinforcement Learning

Authors: HANYANG ZHAO, Wenpin Tang, David Yao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.
Researcher Affiliation Academia Hanyang Zhao Columbia University hz2684@columbia.edu Wenpin Tang Columbia University wt2319@columbia.edu David D. Yao Columbia University yao@columbia.edu
Pseudocode Yes Algorithm 1 CPG: Policy Gradient with exp(β) random rollout; Algorithm 2 CPPO: PPO with adaptive penalty constant; Algorithm 3 CPPO: PPO with adaptive penalty constant (linear KL-divergence)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper describes setting up continuous control environments (LQ stochastic control, 2-dimensional optimal pair trading) with specific parameters, but does not refer to existing publicly available datasets with links or citations in the conventional sense of a fixed dataset.
Dataset Splits No The paper discusses policy evaluation and training steps within continuous reinforcement learning algorithms (e.g., updating critic parameters), but it does not specify traditional training, validation, or test dataset splits, as the experiments involve continuous control tasks rather than pre-split datasets.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper describes algorithm implementations and theoretical aspects but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Table 1: Hyper-parameter values for Example 1; Table 2: Hyperparameter values for Example 2