Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Authors: Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware. ... Empirically we show that proposed autoregressive policies can be used with off-the-shelf learning algorithms and result in superior exploration and learning in sparse reward tasks compared to conventional Gaussian policies, while achieving similar or slightly better performance in tasks with dense reward.
Researcher Affiliation Industry Dmytro Korenkevych , A. Rupam Mahmood , Gautham Vasan and James Bergstra Kindred AI {dmytro.korenkevych, rupam, gautham.vasan, james}@kindred.ai
Pseudocode No The paper provides mathematical derivations and equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce experiments is available at https://github.com/kindredresearch/arp.
Open Datasets Yes Figure 5 shows the learning results on standard Open AI Gym Mujoco environments [Brockman et al., 2016].
Dataset Splits No The paper mentions using environments like Open AI Gym Mujoco but does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions a 'UR5 robotic arm' for physical experiments but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No In the following learning experiments we used the Open AI Baselines PPO algorithm implementation [Schulman et al., 2017]. The results with Baselines TRPO [Schulman et al., 2015] are provided in Supplementary Materials C. The paper mentions specific software libraries/algorithms but does not provide version numbers for these dependencies.
Experiment Setup No For each experiment we used identical algorithm hyper-parameters and neural network structures to parametrize µθ, σθ and the value networks for both Gaussian and ARP policies. ... Detailed parameters for each task are included in Supplementary Materials E. The paper states that detailed parameters are in supplementary materials, but they are not provided in the main text.