Large-Scale Study of Curiosity-Driven Learning

Authors: Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, Alexei A. Efros

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance as well as a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many games. In this paper, we perform a large-scale empirical study of agents driven purely by intrinsic rewards across a range of diverse simulated environments.
Researcher Affiliation -1 Anonymous authors Paper under double-blind review
Pseudocode Yes Algorithm 1: Curiosity-driven Learning
Open Source Code Yes Game-play videos and code are at https: //doubleblindsupplementary.github.io/large-curiosity/. Video results, code and models at https://doubleblindsupplementary.github.io/large-curiosity/. We have released the training code and environments on our website.
Open Datasets Yes We pick a total of 54 diverse simulated environments, as shown in Figure 1, including 48 Atari games (Bellemare et al., 2013), Super Mario Bros., 2 Roboschool scenarios (Schulman et al., 2017), Two-player Pong, 2 Unity mazes (Juliani et al., 2018).
Dataset Splits No The paper describes training and evaluation on different environments/levels (e.g., pre-training on Mario Level 1-1 and testing on Level 1-2 or 1-3), and mentions running multiple trials with different seeds. However, it does not specify explicit percentages or counts for static training/validation/test dataset splits as typically found in supervised learning, which would be required for reproducibility of data partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. It only mentions the number of parallel environments used.
Software Dependencies No The paper mentions software components like 'PPO algorithm' and 'Unity ML-agent framework' and 'ADAM', but does not provide specific version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup Yes Hyper-parameters: We used a learning rate of 0.0001 for all networks. In most experiments, we used 128 parallel environments with the exceptions of the Unity and Roboschool experiments where we could only run 32 parallel environments, and the large scale Mario experiment where we used 1024. We used rollouts of length 128 in all experiments except for the Unity experiments where we used 512 length rollouts so that the network could quickly latch onto the sparse reward. In the initial 9 experiments on Mario and Atari, we used 3 optimization epochs per rollout in the interest of speed. In the Mario scaling, generalization experiments, as well as the Roboschool experiments, we used 6 epochs. In the Unity experiments, we used 8 epochs, again to more quickly take advantage of sparse rewards.