Reparameterized Policy Learning for Multimodal Trajectory Optimization
Authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. |
| Researcher Affiliation | Collaboration | 1UC San Diego 2MIT-IBM Watson AI Lab 3UMass Amherst. |
| Pseudocode | Yes | We describe the whole algorithm in Alg. 1 and implementation details in Appendix A. |
| Open Source Code | Yes | Code and supplementary materials are available on the project page https: //haosulab.github.io/RPG/ |
| Open Datasets | Yes | We take 8 representative environments from standard RL benchmarks, including 2 table-top environments from Meta World (Yu et al., 2020), 2 dexterous hand manipulation tasks from Rajeswaran et al. (2017), 1 navigation problems from Nachum et al. (2018b), and 2 articulated object manipulation from Mani Skill (Mu et al., 2021). |
| Dataset Splits | No | The paper evaluates performance within dynamic reinforcement learning environments and describes interaction-based data collection, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts from a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using 'pytorch' but does not specify its version number or the versions of any other key software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | The hyperparameters for training the network are listed in Table 1. |