Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

Authors: Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions.
Researcher Affiliation Collaboration 1University of Michigan 2Google Brain 3Microsoft Research.
Pseudocode Yes Algorithm 1 Subtask update (Soft)
Open Source Code No The demo videos are available at the following website: https://sites.google.com/a/umich. edu/junhyuk-oh/task-generalization. This link is for demo videos, not source code for the methodology.
Open Datasets No We developed a 3D visual environment using Minecraft based on Oh et al. (2016) as shown in Figure 1. This describes a custom environment, and while it cites a paper, it does not provide concrete access information for the specific data used.
Dataset Splits No The paper mentions training, evaluation, and test sets, but does not provide specific percentages, sample counts, or clear predefined splits for training, validation, or test sets.
Hardware Specification No The paper mentions '16 CPU threads' but does not specify any particular CPU model, GPU, or other hardware components used for running experiments.
Software Dependencies No The paper refers to using 'actor-critic method' and 'LSTM' but does not specify any software names with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes The network architecture of our parameterized skill consists of 4 convolution layers and one LSTM (Hochreiter and Schmidhuber, 1997) layer. We conducted curriculum training by changing the size of the world, the density of object and walls according to the agent s success rate. We implemented actor-critic method with 16 CPU threads based on Sukhbaatar et al. (2015). The parameters are updated after 8 episodes for each thread.