Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reparameterized Policy Learning for Multimodal Trajectory Optimization
Authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. |
| Researcher Affiliation | Collaboration | 1UC San Diego 2MIT-IBM Watson AI Lab 3UMass Amherst. |
| Pseudocode | Yes | We describe the whole algorithm in Alg. 1 and implementation details in Appendix A. |
| Open Source Code | Yes | Code and supplementary materials are available on the project page https: //haosulab.github.io/RPG/ |
| Open Datasets | Yes | We take 8 representative environments from standard RL benchmarks, including 2 table-top environments from Meta World (Yu et al., 2020), 2 dexterous hand manipulation tasks from Rajeswaran et al. (2017), 1 navigation problems from Nachum et al. (2018b), and 2 articulated object manipulation from Mani Skill (Mu et al., 2021). |
| Dataset Splits | No | The paper evaluates performance within dynamic reinforcement learning environments and describes interaction-based data collection, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts from a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using 'pytorch' but does not specify its version number or the versions of any other key software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | The hyperparameters for training the network are listed in Table 1. |