Multi-Task Deep Reinforcement Learning for Continuous Action Control
Authors: Zhaoyang Yang, Kathryn Merrick, Hussein Abbass, Lianwen Jin
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed algorithm and network fuse images with sensor data and were tested with up to 12 movement-based control tasks on a simulated Pioneer 3AT robot equipped with a camera and range sensors. Results show that the proposed algorithm and network can learn skills that are as good as the skills learned by a comparable single-task learning algorithm. ... Our simulations were conducted in Gazebo 2 built in a ROS Indigo environment. ... We tested both the new network architecture and the multi-DDPG algorithm for 12 movement-based control tasks. ... We conducted two sets of experiments to test and analyse the proposed algorithm. |
| Researcher Affiliation | Academia | 1School of Engineering and Information Technology, University of New South Wales, Australia 2School of Electronic and Information Engineering, South China University of Technology, China |
| Pseudocode | Yes | Algorithm 1 Multi-DDPG |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper describes using a "simulated Pioneer 3AT robot" and conducting "simulations in Gazebo 2 under a ROS Indigo environment" but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions training models for a certain number of episodes and testing them intermediately, but it does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, and test sets. |
| Hardware Specification | No | The paper mentions conducting simulations in "Gazebo 2 under a ROS Indigo environment" using a "Pioneer 3AT robot model" but does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Gazebo 2", "ROS Indigo environment", and "Tensor Flow", but does not provide specific ancillary software details with version numbers (e.g., TensorFlow version, Gazebo version, ROS version) needed to replicate the experiment. |
| Experiment Setup | Yes | For all experiments, we trained the model 3 times and tested it intermediately. Each model is trained for 5,000 episodes which contain approximately 60,000 training iterations. Adam [Kingma and Ba, 2015] is used to train the network, with initial learning rates 0.001 and 0.0001 for the critic and actor respectively. We set the discount factor to be 0.9 and train our networks in Tensor Flow [Abadi, et al., 2016]. |