Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
Authors: Sungryull Sohn, Junhyuk Oh, Honglak Lee
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on two 2D visual domains show that our agent can perform complex reasoning to find a near-optimal way of executing the subtask graph and generalize well to the unseen subtask graphs. In the experiment, we investigated the following research questions: |
| Researcher Affiliation | Collaboration | Sungryull Sohn University of Michigan srsohn@umich.edu Junhyuk Oh University of Michigan junhyuk@google.com Honglak Lee Google Brain University of Michigan honglak@google.com |
| Pseudocode | Yes | Algorithm 1 Policy optimization |
| Open Source Code | No | The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository. |
| Open Datasets | No | Mining domain: The set of subtasks and preconditions are hand-coded based on the crafting recipes in Minecraft, and used as a template to generate 640 random subtask graphs. Playground: We randomly generated 500 graphs for training and 2,000 graphs for testing. The datasets of subtask graphs are generated by the authors for their experiments and no public access information is provided. |
| Dataset Splits | No | For the Mining domain, the paper states: 'We used 200 for training and 440 for testing.' For the Playground domain, it states: 'We randomly generated 500 graphs for training and 2,000 graphs for testing.' No explicit validation split is mentioned. |
| Hardware Specification | No | The paper does not specify the particular hardware (e.g., GPU models, CPU types, or memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions certain methods and frameworks (e.g., actor-critic, Maze Base), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We used ηd=1e-4, ηc=3e-6 for distillation and ηac=1e-6, ηc=3e-7 for fine-tuning in the experiment. |