reproducibilityindex.ai

Universal Reinforcement Learning Algorithms: Survey and Experiments

Authors: John Aslanides, Jan Leike, Marcus Hutter

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a short and accessible survey of these URL algorithms under a uniﬁed notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an opensource reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas. ... The contribution of this paper is three-fold: we present a survey of these URL algorithms, and unify their presentation under a consistent vocabulary. We illuminate these agents with an empirical investigation into their behavior and properties.
Researcher Affiliation	Collaboration	John Aslanides , Jan Leike , Marcus Hutter Australian National University Future of Humanity Institute, University of Oxford {john.aslanides, marcus.hutter}@anu.edu.au, leike@google.com
Pseudocode	Yes	Algorithm 1 Bayesian URL agent [Hutter, 2005]; Algorithm 2 Bayes Exp [Lattimore, 2013]; Algorithm 3 Thompson Sampling [Leike et al., 2016]; Algorithm 4 MDL Agent [Lattimore and Hutter, 2011].
Open Source Code	Yes	Our third contribution is to present a portable and extensible open-source software framework1 for experimenting with, and demonstrating, URL algorithms. ... 1The framework is named AIXIJS; the source code can be found at http://github.com/aslanides/aixijs.
Open Datasets	No	We run our experiments on a class of partially-observable gridworlds. ... One might consider constructing M by enumerating all N N gridworlds of the class described above, but this is infeasible as the size of such an enumeration explodes combinatorially... Instead, we choose a discrete parametrization D that enumerates an interesting subset of these gridworlds. ... We use this to create the ﬁrst of our model classes, Mloc. The second, MDirichlet, uses a factorized distribution rather than an explicit mixture to avoid this issue.
Dataset Splits	No	The paper describes the simulated gridworld environments and models used, but does not specify train/validation/test dataset splits. The experiments involve agent-environment interaction in a simulated setting rather than static dataset processing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'AIXIJS' as their framework but does not specify any software names with version numbers for dependencies (e.g., Python, PyTorch, etc.).
Experiment Setup	Yes	Except where otherwise stated, the following experiments were run on a 10 10 gridworld with a single dispenser with θ = 0.75. We average training score over 50 simulations for each agent conﬁguration, and we use κ = 600 MCTS samples and a planning horizon of m = 6. In all cases, discounting is geometric with γ = 0.99. In all cases, the agents are initialized with a uniform prior.