Deep Reinforcement Learning for Multi-contact Motion Planning of Hexapod Robots

Authors: Huiqiao Fu, Kaiqiang Tang, Peng Li, Wenqi Zhang, Xinpeng Wang, Guizhou Deng, Tao Wang, Chunlin Chen

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Both of the simulation and experimental results on physical systems demonstrate the feasibility and efficiency of the proposed method. Videos are shown at https://videoviewpage.wixsite. com/mcrl.
Researcher Affiliation Collaboration Huiqiao Fu1,2 , Kaiqiang Tang1 , Peng Li2 , Wenqi Zhang2 , Xinpeng Wang3 , Guizhou Deng3 , Tao Wang2 and Chunlin Chen1 1School of Management and Engineering, Nanjing University, China 2Advanced Institute of Information Technology (AIIT), Peking University, China 3Southwest University of Science and Technology, China {hqfu, kqtang}@smail.nju.edu.cn, {pli, wqzhang}@aiit.org.cn, wangtao@pku.edu.cn, {xpwang, gzdeng}@mails.swust.edu.cn, clchen@nju.edu.cn
Pseudocode Yes Algorithm 1 DRL for Multi-contact Motion Planning
Open Source Code No The paper mentions "Videos are shown at https://videoviewpage.wixsite.com/mcrl" but does not provide any link or statement regarding the availability of source code for the methodology.
Open Datasets No The paper describes building its own simulation environments (E1, E2, E3) and a real environment (E4) with generated plum-blossom piles, but these are not publicly available datasets with concrete access information (link, DOI, etc.).
Dataset Splits No The paper describes a training process and testing of policies but does not explicitly mention specific train/validation/test dataset splits or their percentages/counts.
Hardware Specification Yes We train our policy network on a computer with an i7-7700 CPU and a Nvidia GTX 1060ti GPU.
Software Dependencies No The RL algorithm is implemented using Pytorch1, and the transition feasibility model used in the training process is solved using Cas ADi2. (Footnotes point to websites, but specific version numbers are not provided in the text).
Experiment Setup Yes For a training process, we first set a random initial point and a random target area with radius of 100 mm. The goal of the hexapod robot is to move successfully from the initial point to the target area with the shortest path. At the beginning of the training process, we first reset the Co M of the hexapod robot to the initial point. ... When the Co M of the hexapod robot reaches the target area or the maximum number of steps in the current episode reaches 300, the current episode is terminated and a new episode is started. We repeat the above process until the end of the training. ... a total of 1 million time-steps are set for training and the whole training process takes about 12 hours in each environment.