IPO: Interior-Point Policy Optimization under Constraints
Authors: Yongshuai Liu, Jiaxin Ding, Xin Liu4940-4947
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction. |
| Researcher Affiliation | Academia | Yongshuai Liu, Jiaxin Ding, Xin Liu University of California, Davis {yshliu, jxding, xinliu}@ucdavis.edu |
| Pseudocode | Yes | Algorithm 1 The procedure of IPO |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the described methodology was found. |
| Open Datasets | Yes | We conduct experiments and compare IPO with CPO and PDO in various scenarios: three tasks in the Mujoco simulator (Point-Gather, Point-Circle (Achiam et al. 2017), Half Cheetah-Safe (Chow et al. 2019)) and a grid-world task (Mars-Rover) inspired by (Chow et al. 2015). |
| Dataset Splits | No | The paper describes sampling N trajectories and running experiments multiple times with different random seeds, but does not provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets) in the conventional supervised learning sense. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions software components like PPO, TRPO, Adam optimizer, and Mujoco simulator, but does not provide specific version numbers for any of them. |
| Experiment Setup | No | While the paper discusses hyperparameters like the PPO clip rate 'r', logarithmic barrier hyperparameter 't', and learning rates, it does not provide concrete numerical values for these specific parameters used in the main experimental setup, but rather discusses their tuning or ranges. |