Learning Unmanned Aerial Vehicle Control for Autonomous Target Following

Authors: Siyi Li, Tianbo Liu, Chi Zhang, Dit-Yan Yeung, Shaojie Shen

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present a series of experiments to answer the following research questions: 1. Is introducing the PID controller essential for successful training? 2. How does the training strategy work compared to standard end-to-end training? 3. How does the learned high-level policy network generalize across different environments? To answer question (1) and (2), we evaluate different variations of the proposed system, in Section 4.2, by training policies for the target following task in a simulated environment. We further evaluate the generalization ability of the learned policy in Section 4.3 by testing it in various simulated environments. Finally, we set up a real-world flight test in Section 4.4.
Researcher Affiliation Academia Siyi Li1, Tianbo Liu2, Chi Zhang1, Dit-Yan Yeung1, Shaojie Shen2 1 Department of Computer Science and Engineering, HKUST 2 Department of Electronic and Computer Engineering, HKUST {sliay, czhangbr, dyyeung}@cse.ust.hk, {tliuam, eeshaojie}@ust.hk
Pseudocode No The paper describes the Deep Deterministic Policy Gradient (DDPG) algorithm and its hierarchical control system, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code, nor does it provide a link to a code repository for the described methodology.
Open Datasets No The paper states that the experiments were set up on the Virtual Robot Experimentation Platform (V-REP) using its built-in quadrotor model, and that data was collected by randomly moving the quadrotor and recording camera images. However, it does not provide concrete access information (link, DOI, formal citation) for a publicly available dataset used for training.
Dataset Splits No The paper discusses training strategies and evaluation, but it does not explicitly provide specific dataset split information (e.g., percentages or sample counts for training, validation, and testing sets) for reproducibility.
Hardware Specification Yes Our quadrotor testbed is based on the DJI Matrice 100 platform equipped with an Intel NUC and a camera. For speed consideration, the policy network computation is deployed on a ground laptop M1000 GPU which communicates with the onboard NUC by Robot Operating System (ROS).
Software Dependencies No The paper mentions that the implementation is based on 'rllab' and uses 'Adam' for optimization, but it does not provide specific version numbers for these or other software components like programming languages or libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes For the reward setting, we use τ1 = 0.05, τ2 = 0.2, and c = 0.5. We use Adam [Kingma and Ba, 2015] for optimization with the hyperparameters set according to [Lillicrap et al., 2016].