Making Better Decision by Directly Planning in Continuous Control

Authors: Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluation, we conduct several experiments on the benchmark Mu Jo Co continuous control tasks. The results show our proposed method can significantly improve the sample efficiency and asymptotic performance. Besides, comprehensive ablation studies are also performed to verify the necessity and effectiveness of our proposed D3P planner.
Researcher Affiliation Collaboration Jinhua Zhu1, Yue Wang2, Lijun Wu2, Tao Qin2, Wengang Zhou1, Tie-Yan Liu2, Houqiang Li1 1University of Science and Technology of China; 2Microsoft Research AI4Science
Pseudocode Yes Algorithm 1 Deep Differential Dynamic Programming (D3P) (Page 5) and Algorithm 2 POMP (Page 9)
Open Source Code Yes Our code is released at https://github.com/POMP-D3P/POMP-D3P.
Open Datasets Yes To answer the above questions, we evaluate our method on continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012).
Dataset Splits No The paper describes continuous interaction with simulation environments and refers to 'training steps' and 'evaluation', but does not specify explicit training/validation/test dataset splits with percentages or sample counts in the way supervised learning on static datasets does.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like 'Adam optimizer (Kingma & Ba, 2014)', 'Mu Jo Co simulator (Todorov et al., 2012)', and refers to other RL algorithms (e.g., SAC (Haarnoja et al., 2018)), but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes The detailed hyper-parameters are summarized in Table 1, and refer to Janner et al. (2019b); Clavera et al. (2019) for more details. (...) Table 1: The detailed hyper-parameters in our experiments.