reproducibilityindex.ai

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Authors: Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we demonstrate that using ARPs results in improved exploration and sample efﬁciency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware. ... Empirically we show that proposed autoregressive policies can be used with off-the-shelf learning algorithms and result in superior exploration and learning in sparse reward tasks compared to conventional Gaussian policies, while achieving similar or slightly better performance in tasks with dense reward.
Researcher Affiliation	Industry	Dmytro Korenkevych , A. Rupam Mahmood , Gautham Vasan and James Bergstra Kindred AI {dmytro.korenkevych, rupam, gautham.vasan, james}@kindred.ai
Pseudocode	No	The paper provides mathematical derivations and equations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce experiments is available at https://github.com/kindredresearch/arp.
Open Datasets	Yes	Figure 5 shows the learning results on standard Open AI Gym Mujoco environments [Brockman et al., 2016].
Dataset Splits	No	The paper mentions using environments like Open AI Gym Mujoco but does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper mentions a 'UR5 robotic arm' for physical experiments but does not provide specific details about the computing hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	In the following learning experiments we used the Open AI Baselines PPO algorithm implementation [Schulman et al., 2017]. The results with Baselines TRPO [Schulman et al., 2015] are provided in Supplementary Materials C. The paper mentions specific software libraries/algorithms but does not provide version numbers for these dependencies.
Experiment Setup	No	For each experiment we used identical algorithm hyper-parameters and neural network structures to parametrize µθ, σθ and the value networks for both Gaussian and ARP policies. ... Detailed parameters for each task are included in Supplementary Materials E. The paper states that detailed parameters are in supplementary materials, but they are not provided in the main text.