reproducibilityindex.ai

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

Authors: Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we test on a variety of difﬁcult sparse reward problems simulated through Mujoco (Todorov et al., 2012). We use two popular and challenging agents: Ant and Humanoid.
Researcher Affiliation	Collaboration	Kenneth Marino & Abhinav Gupta Carnegie Mellon University, Facebook AI Research {kdmarino,abhinavg}@cs.cmu.edu Rob Fergus New York University, Facebook AI Research fergus@cs.nyu.edu Arthur Szlam Facebook AI Research aszlam@fb.com
Pseudocode	Yes	Algorithm 1 Our method
Open Source Code	No	The paper provides a link to a project page (https://sites.google.com/view/hrl-ep3) which typically hosts supplementary materials and videos, but it does not explicitly state that the source code for the methodology is available at this link or elsewhere.
Open Datasets	Yes	In our experiments, we test on a variety of difﬁcult sparse reward problems simulated through Mujoco (Todorov et al., 2012). We use two popular and challenging agents: Ant and Humanoid. ... We compare our method to baselines similar to those used in Haarnoja et al. (2018a), all trained with PPO as is our method. The baseline models are either trained with or without the phase conditioning, and either from scratch, or ﬁnetuned (meaning that we initialize the network using a network trained on our low-level objective). We also give some of the baselines more information by also giving them a velocity reward during high-level training (meaning they are rewarded for movement of the agent).
Dataset Splits	No	The paper conducts experiments in reinforcement learning environments but does not specify train/validation/test splits for a static dataset, which is common in interactive simulation settings.
Hardware Specification	No	The paper mentions 'running serially on CPU' when discussing comparison to other methods but does not provide specific details on the hardware used, such as CPU model, number of cores, or GPU specifications.
Software Dependencies	No	The paper mentions using implementations from Kostrikov (2018) for RL algorithms (PPO, A2C) and its own DQN implementation, as well as the ADAM optimizer. However, it does not provide specific version numbers for these software components or any underlying libraries like PyTorch or TensorFlow.
Experiment Setup	Yes	The hyperparameters for these three algorithms are shown in Tables 1, 2 and 3 We use the ADAM (Kingma & Ba, 2014) optimizer. ... During low-level training we train 80 policies using different random seeds. ... For our Ant models, we use a 3-layer MLP with tanh activation functions and a hidden size of 32. For Humanoid we add skip connections between layers and decrease the hidden size to 16. ... We choose the cyclic constraint multipliers for state (λs) and action (λa) to be 0.05 and 0.01 respectively.