Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies

Authors: Sungryull Sohn, Junhyuk Oh, Honglak Lee

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on two 2D visual domains show that our agent can perform complex reasoning to find a near-optimal way of executing the subtask graph and generalize well to the unseen subtask graphs. In the experiment, we investigated the following research questions:
Researcher Affiliation Collaboration Sungryull Sohn University of Michigan srsohn@umich.edu Junhyuk Oh University of Michigan junhyuk@google.com Honglak Lee Google Brain University of Michigan honglak@google.com
Pseudocode Yes Algorithm 1 Policy optimization
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets No Mining domain: The set of subtasks and preconditions are hand-coded based on the crafting recipes in Minecraft, and used as a template to generate 640 random subtask graphs. Playground: We randomly generated 500 graphs for training and 2,000 graphs for testing. The datasets of subtask graphs are generated by the authors for their experiments and no public access information is provided.
Dataset Splits No For the Mining domain, the paper states: 'We used 200 for training and 440 for testing.' For the Playground domain, it states: 'We randomly generated 500 graphs for training and 2,000 graphs for testing.' No explicit validation split is mentioned.
Hardware Specification No The paper does not specify the particular hardware (e.g., GPU models, CPU types, or memory) used to run the experiments.
Software Dependencies No The paper mentions certain methods and frameworks (e.g., actor-critic, Maze Base), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We used ηd=1e-4, ηc=3e-6 for distillation and ηac=1e-6, ηc=3e-7 for fine-tuning in the experiment.