Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Authors: Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on a stochastic 3D domain show that the proposed ideas are crucial for generalization to longer instructions as well as unseen instructions. |
| Researcher Affiliation | Collaboration | 1University of Michigan 2Google Brain 3Microsoft Research. |
| Pseudocode | Yes | Algorithm 1 Subtask update (Soft) |
| Open Source Code | No | The demo videos are available at the following website: https://sites.google.com/a/umich. edu/junhyuk-oh/task-generalization. This link is for demo videos, not source code for the methodology. |
| Open Datasets | No | We developed a 3D visual environment using Minecraft based on Oh et al. (2016) as shown in Figure 1. This describes a custom environment, and while it cites a paper, it does not provide concrete access information for the specific data used. |
| Dataset Splits | No | The paper mentions training, evaluation, and test sets, but does not provide specific percentages, sample counts, or clear predefined splits for training, validation, or test sets. |
| Hardware Specification | No | The paper mentions '16 CPU threads' but does not specify any particular CPU model, GPU, or other hardware components used for running experiments. |
| Software Dependencies | No | The paper refers to using 'actor-critic method' and 'LSTM' but does not specify any software names with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | The network architecture of our parameterized skill consists of 4 convolution layers and one LSTM (Hochreiter and Schmidhuber, 1997) layer. We conducted curriculum training by changing the size of the world, the density of object and walls according to the agent s success rate. We implemented actor-critic method with 16 CPU threads based on Sukhbaatar et al. (2015). The parameters are updated after 8 episodes for each thread. |