IPO: Interior-Point Policy Optimization under Constraints

Authors: Yongshuai Liu, Jiaxin Ding, Xin Liu4940-4947

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.
Researcher Affiliation Academia Yongshuai Liu, Jiaxin Ding, Xin Liu University of California, Davis {yshliu, jxding, xinliu}@ucdavis.edu
Pseudocode Yes Algorithm 1 The procedure of IPO
Open Source Code No No explicit statement or link providing concrete access to the source code for the described methodology was found.
Open Datasets Yes We conduct experiments and compare IPO with CPO and PDO in various scenarios: three tasks in the Mujoco simulator (Point-Gather, Point-Circle (Achiam et al. 2017), Half Cheetah-Safe (Chow et al. 2019)) and a grid-world task (Mars-Rover) inspired by (Chow et al. 2015).
Dataset Splits No The paper describes sampling N trajectories and running experiments multiple times with different random seeds, but does not provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets) in the conventional supervised learning sense.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were provided in the paper.
Software Dependencies No The paper mentions software components like PPO, TRPO, Adam optimizer, and Mujoco simulator, but does not provide specific version numbers for any of them.
Experiment Setup No While the paper discusses hyperparameters like the PPO clip rate 'r', logarithmic barrier hyperparameter 't', and learning rates, it does not provide concrete numerical values for these specific parameters used in the main experimental setup, but rather discusses their tuning or ranges.