Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Authors: Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions. |
| Researcher Affiliation | Collaboration | 1University of Michigan 2Google Brain 3Microsoft Research. |
| Pseudocode | Yes | Algorithm 1 Subtask update (Soft) |
| Open Source Code | No | The demo videos are available at the following website: https://sites.google.com/a/umich. edu/junhyuk-oh/task-generalization. This link is for demo videos, not source code for the methodology. |
| Open Datasets | No | We developed a 3D visual environment using Minecraft based on Oh et al. (2016) as shown in Figure 1. This describes a custom environment, and while it cites a paper, it does not provide concrete access information for the specific data used. |
| Dataset Splits | No | The paper mentions training, evaluation, and test sets, but does not provide specific percentages, sample counts, or clear predefined splits for training, validation, or test sets. |
| Hardware Specification | No | The paper mentions '16 CPU threads' but does not specify any particular CPU model, GPU, or other hardware components used for running experiments. |
| Software Dependencies | No | The paper refers to using 'actor-critic method' and 'LSTM' but does not specify any software names with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The network architecture of our parameterized skill consists of 4 convolution layers and one LSTM (Hochreiter and Schmidhuber, 1997) layer. We conducted curriculum training by changing the size of the world, the density of object and walls according to the agent s success rate. We implemented actor-critic method with 16 CPU threads based on Sukhbaatar et al. (2015). The parameters are updated after 8 episodes for each thread. |