reproducibilityindex.ai

Hierarchical Reinforcement Learning by Discovering Intrinsic Options

Authors: Jesse Zhang, Haonan Yu, Wei Xu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate success rate and sample efﬁciency across two environment suites, as shown in Figure 2. Important details are presented here with more information in appendix Section B.
Researcher Affiliation	Collaboration	Jesse Zhang 1, Haonan Yu 2, Wei Xu2 1University of Southern California, 2Horizon Robotics
Pseudocode	Yes	A PSEUDO CODE FOR HIDIO Algorithm 1: Hierarchical RL with Intrinsic Options Discovery
Open Source Code	Yes	Code available at https://www.github.com/jesbu1/hidio.
Open Datasets	Yes	The ﬁrst suite consists of two 7-DOF reaching and pushing environments evaluated in Chua et al. (2018). ... We also propose another suite of environments called SOCIALROBOT 3. We construct two sparse reward robotic navigation and manipulation tasks, GOALTASK and KICKBALL. ... Code available at https://www.github.com/Horizon Robotics/Social Robot
Dataset Splits	No	The paper describes training parameters and evaluation intervals for continuous interaction with environments, but does not specify explicit training/validation/test dataset splits in terms of percentages or counts from a pre-defined static dataset, which is typical for supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running its experiments.
Software Dependencies	No	The paper states "We implement HIDIO based on an RL framework called ALF", but does not provide specific version numbers for ALF or any other software dependencies like programming languages or libraries.
Experiment Setup	Yes	Number of parallel actors/environments per rollout: 20... Steps per episode: 100... Batch size: 2048... Learning rate: 10−4 for all network modules... Policy/Q network hidden layers: (256, 256, 256) with Re LU non-linearities... Polyak averaging coefﬁcient for target Q: 0.999... Training batches per iteration: 100