Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Authors: YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, Chelsea Finn

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis reveals that the compositional nature of language is critical for learning diverse sub-skills and systematically generalizing to new sub-skills in comparison to non-compositional abstractions that use the same supervision.2
Researcher Affiliation Industry Yiding Jiang , Shixiang Gu, Kevin Murphy, Chelsea Finn Google Research {ydjiang,shanegu,kpmurphy,chelseaf}@google.com
Pseudocode Yes Pseudocode for the method can be found in Algorithm 2 in Appendix C.2 and an illustration of the process can be found in Figure 1.
Open Source Code Yes Code and videos of the environment, and experiments are at https://sites.google.com/view/hal-demo
Open Datasets Yes To empirically study the role of language abstractions for long-horizon tasks, we introduce a new environment inspired by the CLEVR engine [28] that consists of procedurally-generated scenes of objects that are paired with programatically-generated language descriptions.
Dataset Splits Yes To evaluate this (i.e. (3)), we design the training and test instruction sets that are systematically distinct. We evaluate the agent s ability to perform such generalization by splitting the 600 instruction sets through the following procedure: (i) standard: random 70/30 split of the instruction set; (ii) systematic: the training set only consists of instructions that do not contain the words red in the first half of the instructions and the test set contains only those that have red in the first half of the instructions.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies No No specific ancillary software details, such as library or solver names with version numbers, were provided. The paper mentions "Mu Jo Co physics engine [66]" and "CLEVR engine [28]" but without version numbers for the engines themselves or any deep learning frameworks/libraries.
Experiment Setup Yes For all experiments, we use Adam [31] as our optimizer with a learning rate of 1e-4. The replay buffer size is 10^5, batch size 128, and discount factor 0.99. We use a target network update frequency of 5000 environment steps.