Discretizing Continuous Action Space for On-Policy Optimization

Authors: Yunhao Tang, Shipra Agrawal5981-5988

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, through extensive experiments we show how the discrete/ordinal policy improves upon current on-policy optimization baselines and related prior works, especially on high-dimensional tasks with complex dynamics. Our experiments aim to address the following questions: (a) Does discrete policy improve the performance of baseline algorithms on benchmark continuous control tasks? (b) Does the ordinal architecture further improve upon discrete policy? (c) How sensitive is the performance to hyper-parameters, particularly to the number of bins per action dimension? All benchmark comparison results are presented in plots (Figure 2,3) or tables (Table 1,2).
Researcher Affiliation Academia Yunhao Tang,1 Shipra Agrawal1 1Columbia University, New York, NY, USA {yt2541, sa3305}@columbia.edu
Pseudocode No The paper describes algorithms (TRPO, PPO) using mathematical formulations, but it does not include any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states that the algorithms used are 'originally implemented in Open AI baselines (Dhariwal et al., 2017)' but does not provide a link or explicit statement about the availability of the authors' own source code for the methodology described in the paper.
Open Datasets Yes We evaluate on benchmark tasks in gym Mu Jo Co (Brockman et al., 2016; Todorov, 2008), rllab (Duan et al., 2016), roboschool (Schulman et al., 2015a) and Box2D.
Dataset Splits No The paper refers to training for a 'fixed number of time steps' and reports results averaged over 'random seeds' and 'training iterations' on benchmark tasks. However, it does not explicitly provide specific percentages, sample counts, or detailed methodology for training/validation/test dataset splits within the paper text.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper states that algorithms are 'originally implemented in Open AI baselines (Dhariwal et al., 2017)', but it does not provide specific version numbers for programming languages, libraries, or other software dependencies required to replicate the experiments.
Experiment Setup No The 'Implementation Details' section states, 'We leave all hyper-parameter settings in Appendix A.', indicating that specific experimental setup details are not provided in the main text of the paper.