reproducibilityindex.ai

Making Better Decision by Directly Planning in Continuous Control

Authors: Jinhua Zhu, Yue Wang, Lijun Wu, Tao Qin, Wengang Zhou, Tie-Yan Liu, Houqiang Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation, we conduct several experiments on the benchmark Mu Jo Co continuous control tasks. The results show our proposed method can significantly improve the sample efficiency and asymptotic performance. Besides, comprehensive ablation studies are also performed to verify the necessity and effectiveness of our proposed D3P planner.
Researcher Affiliation	Collaboration	Jinhua Zhu1, Yue Wang2, Lijun Wu2, Tao Qin2, Wengang Zhou1, Tie-Yan Liu2, Houqiang Li1 1University of Science and Technology of China; 2Microsoft Research AI4Science
Pseudocode	Yes	Algorithm 1 Deep Differential Dynamic Programming (D3P) (Page 5) and Algorithm 2 POMP (Page 9)
Open Source Code	Yes	Our code is released at https://github.com/POMP-D3P/POMP-D3P.
Open Datasets	Yes	To answer the above questions, we evaluate our method on continuous control benchmark tasks in the Mu Jo Co simulator (Todorov et al., 2012).
Dataset Splits	No	The paper describes continuous interaction with simulation environments and refers to 'training steps' and 'evaluation', but does not specify explicit training/validation/test dataset splits with percentages or sample counts in the way supervised learning on static datasets does.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer (Kingma & Ba, 2014)', 'Mu Jo Co simulator (Todorov et al., 2012)', and refers to other RL algorithms (e.g., SAC (Haarnoja et al., 2018)), but it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	The detailed hyper-parameters are summarized in Table 1, and refer to Janner et al. (2019b); Clavera et al. (2019) for more details. (...) Table 1: The detailed hyper-parameters in our experiments.