Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
IPO: Interior-Point Policy Optimization under Constraints
Authors: Yongshuai Liu, Jiaxin Ding, Xin Liu4940-4947
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction. |
| Researcher Affiliation | Academia | Yongshuai Liu, Jiaxin Ding, Xin Liu University of California, Davis EMAIL |
| Pseudocode | Yes | Algorithm 1 The procedure of IPO |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the described methodology was found. |
| Open Datasets | Yes | We conduct experiments and compare IPO with CPO and PDO in various scenarios: three tasks in the Mujoco simulator (Point-Gather, Point-Circle (Achiam et al. 2017), Half Cheetah-Safe (Chow et al. 2019)) and a grid-world task (Mars-Rover) inspired by (Chow et al. 2015). |
| Dataset Splits | No | The paper describes sampling N trajectories and running experiments multiple times with different random seeds, but does not provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets) in the conventional supervised learning sense. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions software components like PPO, TRPO, Adam optimizer, and Mujoco simulator, but does not provide specific version numbers for any of them. |
| Experiment Setup | No | While the paper discusses hyperparameters like the PPO clip rate 'r', logarithmic barrier hyperparameter 't', and learning rates, it does not provide concrete numerical values for these specific parameters used in the main experimental setup, but rather discusses their tuning or ranges. |