Reparameterized Policy Learning for Multimodal Trajectory Optimization

Authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward.
Researcher Affiliation Collaboration 1UC San Diego 2MIT-IBM Watson AI Lab 3UMass Amherst.
Pseudocode Yes We describe the whole algorithm in Alg. 1 and implementation details in Appendix A.
Open Source Code Yes Code and supplementary materials are available on the project page https: //haosulab.github.io/RPG/
Open Datasets Yes We take 8 representative environments from standard RL benchmarks, including 2 table-top environments from Meta World (Yu et al., 2020), 2 dexterous hand manipulation tasks from Rajeswaran et al. (2017), 1 navigation problems from Nachum et al. (2018b), and 2 articulated object manipulation from Mani Skill (Mu et al., 2021).
Dataset Splits No The paper evaluates performance within dynamic reinforcement learning environments and describes interaction-based data collection, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts from a static dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud computing instances.
Software Dependencies No The paper mentions using 'pytorch' but does not specify its version number or the versions of any other key software dependencies required to reproduce the experiments.
Experiment Setup Yes The hyperparameters for training the network are listed in Table 1.