Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control
Authors: Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, Wojciech Matusik
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed algorithm can efficiently find a significantly higher-quality set of Pareto-optimal policies than existing methods. |
| Researcher Affiliation | Academia | 1Computer Science & Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology 2Texas A&M University. |
| Pseudocode | Yes | Algorithm 1 Prediction-Guided MORL Algorithm |
| Open Source Code | Yes | The code can be found at https://github.com/mitgfx/PGMORL |
| Open Datasets | No | The paper designs 'seven multi-objective RL environments with continuous action space based on Mujoco', described in Appendix C, but it does not provide concrete access information (link, DOI, or formal citation for a public dataset repository) for these specific environments as a dataset. |
| Dataset Splits | No | The paper describes reinforcement learning stages (Warm-up Stage, Evolutionary Stage) and training processes for policies, but it does not specify traditional dataset splits (e.g., percentages or counts for training, validation, and testing data) as would be typical for supervised learning tasks. |
| Hardware Specification | No | The paper mentions evaluating performance using a 'physics-based simulation system (Todorov et al., 2012)' but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components such as Mujoco, PPO, t-SNE, and k-means, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | More details about the experiment setup are described in Appendix D.1. The training details and parameters are reported in Appendix D.2. |