Behaviour Suite for Reinforcement Learning

Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to study agent behaviour through their performance on these shared benchmarks. To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite.
Researcher Affiliation Industry Ian Osband , Yotam Doron, Matteo Hessel, John Aslanides Eren Sezener, Andre Saraiva, Katrina Mc Kinney, Tor Lattimore, Csaba Szepesvari Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt. Corresponding author iosband@google.com.
Pseudocode No The paper describes the experiments and analysis but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes To complement this effort, we open source github.com/deepmind/bsuite, which automates evaluation and analysis of any agent on bsuite.
Open Datasets Yes Contextual bandit classification of MNIST with 1 rewards (Le Cun et al., 1998).
Dataset Splits No The paper describes running agents for a certain number of episodes (e.g., 10k episodes, 1k episodes) and using multiple seeds (e.g., 20 seeds, 4 seeds each), and varying problem parameters, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for reproducing data partitioning.
Hardware Specification No The paper mentions 'Launch scripts for Google cloud that automate large scale compute at low cost' and 'Fast: iteration from launch to results in under 30min on standard CPU', but it does not provide specific hardware details such as exact GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions 'Our code is Python', 'Tensorflow (Abadi et al., 2015)', and includes examples with 'Open AI Baselines, Dopamine', but it does not specify exact version numbers for these software dependencies, which are necessary for reproducibility.
Experiment Setup Yes For the bsuite experiment we run the agent on sizes N = 1, .., 100 exponentially spaced and look at the average regret compared to optimal after 10k episodes. The summary score is the percentage of runs for which the average regret is less than 75% of that achieved by a uniformly random policy." and "In each case we tune a learning rate to optimize performance on basic tasks from {1e-1, 1e-2, 1e-3}, keeping all other parameters constant at default value.