Distral: Robust multitask reinforcement learning
Authors: Yee Teh, Victor Bapst, Wojciech M. Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, Razvan Pascanu
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. |
| Researcher Affiliation | Industry | Deep Mind London, UK |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks with clear labels. |
| Open Source Code | No | The paper does not provide concrete access to source code, such as a specific repository link or an explicit code release statement. |
| Open Datasets | No | The paper uses custom environments (grid world, 3D mazes, navigation, laser-tag) but does not provide access information (link, DOI, citation with authors/year) for these or any other public datasets. |
| Dataset Splits | No | The paper discusses learning curves and performance but does not specify exact percentages or sample counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions a 'distributed Python/Tensor Flow code base' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions 'Python/Tensor Flow' but does not specify version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | We tried three values for the entropy costs β and three learning rates . Four runs for each hyperparameter setting were used. All other hyperparameters were fixed to the single-task A3C defaults and, for the KL+ent 1col and KL+ent 2col algorithms, was fixed at 0.5. |