reproducibilityindex.ai

A Strongly Asymptotically Optimal Agent in General Environments

Authors: Michael K. Cohen, Elliot Catt, Marcus Hutter

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments in grid-worlds to compare the Inquisitive Reinforcement Learner to other weakly asymptotically optimal agents. 5 Experimental Results We compared Inq with other known weakly asymptotically optimal agents, Thompson sampling and Bayes Exp [Lattimore and Hutter, 2014a], in the grid-world environment using AIXIjs [Aslanides, 2017] which has previously been used to compare asymptotically optimal agents [Aslanides et al., 2017]. We tested in 10 10 grid-worlds, and 20 20 gridworlds, both with a single dispenser with probability of dispensing reward 0.75; that is, if the agent enters that cell, the probability of a reward of 1 is 0.75. Following the conventions of [Aslanides et al., 2017] we averaged over 50 simulations, used discount factor γ = 0.99, 600 MCTS samples, and planning horizon of 6.
Researcher Affiliation	Academia	Michael K. Cohen , Elliot Catt and Marcus Hutter Australian National University {michael.cohen, elliot.carpentercatt, marcus.hutter}@anu.edu.au
Pseudocode	Yes	Algorithm 1 Inquisitive Reinforcement Learner s Policy
Open Source Code	Yes	The code used for this experiment is available online at https: //github.com/ejcatt/aixijs, and this version of Inq can be run in the browser at https://ejcatt.github.io/aixijs/demo.html#inq.
Open Datasets	Yes	We compared Inq with other known weakly asymptotically optimal agents, Thompson sampling and Bayes Exp [Lattimore and Hutter, 2014a], in the grid-world environment using AIXIjs [Aslanides, 2017] which has previously been used to compare asymptotically optimal agents [Aslanides et al., 2017]. We tested in 10 10 grid-worlds, and 20 20 gridworlds, both with a single dispenser with probability of dispensing reward 0.75; that is, if the agent enters that cell, the probability of a reward of 1 is 0.75.
Dataset Splits	No	The paper describes experiments in simulated grid-world environments but does not specify traditional training, validation, or test dataset splits in terms of percentages or sample counts, as is common for fixed datasets. The experimental setup involves continuous interaction within the simulated environment.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'AIXIjs [Aslanides, 2017]' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	Following the conventions of [Aslanides et al., 2017] we averaged over 50 simulations, used discount factor γ = 0.99, 600 MCTS samples, and planning horizon of 6. We found that using small values for , speciﬁcally 1 worked well. For our experiments we chose = 1.