reproducibilityindex.ai

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Authors: Avi Singh, Huihan Liu, Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments seek to answer: (1) Can the behavioral prior accelerate learning of new tasks? (2) How does PARROT compare to prior works that accelerate RL with demonstrations? (3) How does PARROT compare to prior methods that combine hierarchical imitation with RL? ... Our results are summarised in Figure 5. We see that PARROT is able to solve all of the tasks substantially faster and achieve substantially higher ﬁnal returns than other methods.
Researcher Affiliation	Academia	Avi Singh , Huihan Liu , Gaoyue Zhou, Albert Yu, Nicholas Rhinehart, Sergey Levine University of California, Berkeley Equal contribution. Correspondence to Avi Singh (avisingh@berkeley.edu).
Pseudocode	Yes	Algorithm 1 RL with Behavioral Priors ... Algorithm 2 Scripted Grasping ... Algorithm 3 Scripted Pick and Place
Open Source Code	No	Additional materials can be found on our project website: https://sites.google.com/view/parrot-rl (This link leads to a project website, not an explicit code repository or code release statement.)
Open Datasets	Yes	To collect data in diverse environments, we used 3D object models from the Shape Net dataset (Chang et al., 2015) and the Py Bullet (Coumans & Bai, 2016) object libraries.
Dataset Splits	No	The paper discusses training and testing, but does not explicitly provide details on validation splits (e.g., specific percentages or sample counts for validation data).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications) are provided for running the experiments.
Software Dependencies	No	The paper mentions tools and algorithms like Adam optimizer, Soft Actor-Critic, and PyBullet, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use a learning rate of 1e 4 and the Adam (Kingma & Ba, 2015) optimizer to train the behavioral prior for 500K steps. ... Table 1: Hyperparameters for soft-actor critic (SAC) Hyperparameter value used Target network update period 1000 steps discount factor γ 0.99 policy learning rate 3e 4 Q-function learning rate 3e 4 reward scale 1.0 automatic entropy tuning enabled number of update steps per env step 1