reproducibilityindex.ai

Behaviour Suite for Reinforcement Learning

Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and eﬃcient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this eﬀort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite.
Researcher Affiliation	Industry	Ian Osband , Yotam Doron, Matteo Hessel, John Aslanides Eren Sezener, Andre Saraiva, Katrina Mc Kinney, Tor Lattimore, Csaba Szepesvari Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt. Corresponding author iosband@google.com.
Pseudocode	No	The paper describes the experiments and analysis but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	To complement this eﬀort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite.
Open Datasets	Yes	Contextual bandit classiﬁcation of MNIST with 1 rewards (Le Cun et al., 1998).
Dataset Splits	No	The paper describes running agents for a certain number of episodes (e.g., 10k episodes, 1k episodes) and using multiple seeds (e.g., 20 seeds, 4 seeds each), and varying problem parameters, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for reproducing data partitioning.
Hardware Specification	No	The paper mentions 'Launch scripts for Google cloud that automate large scale compute at low cost' and 'Fast: iteration from launch to results in under 30min on standard CPU', but it does not provide specific hardware details such as exact GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions 'Our code is Python', 'Tensorﬂow (Abadi et al., 2015)', and includes examples with 'Open AI Baselines, Dopamine', but it does not specify exact version numbers for these software dependencies, which are necessary for reproducibility.
Experiment Setup	Yes	For the bsuite experiment we run the agent on sizes N = 1, .., 100 exponentially spaced and look at the average regret compared to optimal after 10k episodes. The summary score is the percentage of runs for which the average regret is less than 75% of that achieved by a uniformly random policy." and "In each case we tune a learning rate to optimize performance on basic tasks from {1e-1, 1e-2, 1e-3}, keeping all other parameters constant at default value.