Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Authors: YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, Chelsea Finn
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis reveals that the compositional nature of language is critical for learning diverse sub-skills and systematically generalizing to new sub-skills in comparison to non-compositional abstractions that use the same supervision.2 |
| Researcher Affiliation | Industry | Yiding Jiang , Shixiang Gu, Kevin Murphy, Chelsea Finn Google Research {ydjiang,shanegu,kpmurphy,chelseaf}@google.com |
| Pseudocode | Yes | Pseudocode for the method can be found in Algorithm 2 in Appendix C.2 and an illustration of the process can be found in Figure 1. |
| Open Source Code | Yes | Code and videos of the environment, and experiments are at https://sites.google.com/view/hal-demo |
| Open Datasets | Yes | To empirically study the role of language abstractions for long-horizon tasks, we introduce a new environment inspired by the CLEVR engine [28] that consists of procedurally-generated scenes of objects that are paired with programatically-generated language descriptions. |
| Dataset Splits | Yes | To evaluate this (i.e. (3)), we design the training and test instruction sets that are systematically distinct. We evaluate the agent s ability to perform such generalization by splitting the 600 instruction sets through the following procedure: (i) standard: random 70/30 split of the instruction set; (ii) systematic: the training set only consists of instructions that do not contain the words red in the first half of the instructions and the test set contains only those that have red in the first half of the instructions. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | No specific ancillary software details, such as library or solver names with version numbers, were provided. The paper mentions "Mu Jo Co physics engine [66]" and "CLEVR engine [28]" but without version numbers for the engines themselves or any deep learning frameworks/libraries. |
| Experiment Setup | Yes | For all experiments, we use Adam [31] as our optimizer with a learning rate of 1e-4. The replay buffer size is 10^5, batch size 128, and discount factor 0.99. We use a target network update frequency of 5000 environment steps. |