A Strongly Asymptotically Optimal Agent in General Environments
Authors: Michael K. Cohen, Elliot Catt, Marcus Hutter
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments in grid-worlds to compare the Inquisitive Reinforcement Learner to other weakly asymptotically optimal agents. 5 Experimental Results We compared Inq with other known weakly asymptotically optimal agents, Thompson sampling and Bayes Exp [Lattimore and Hutter, 2014a], in the grid-world environment using AIXIjs [Aslanides, 2017] which has previously been used to compare asymptotically optimal agents [Aslanides et al., 2017]. We tested in 10 10 grid-worlds, and 20 20 gridworlds, both with a single dispenser with probability of dispensing reward 0.75; that is, if the agent enters that cell, the probability of a reward of 1 is 0.75. Following the conventions of [Aslanides et al., 2017] we averaged over 50 simulations, used discount factor γ = 0.99, 600 MCTS samples, and planning horizon of 6. |
| Researcher Affiliation | Academia | Michael K. Cohen , Elliot Catt and Marcus Hutter Australian National University {michael.cohen, elliot.carpentercatt, marcus.hutter}@anu.edu.au |
| Pseudocode | Yes | Algorithm 1 Inquisitive Reinforcement Learner s Policy |
| Open Source Code | Yes | The code used for this experiment is available online at https: //github.com/ejcatt/aixijs, and this version of Inq can be run in the browser at https://ejcatt.github.io/aixijs/demo.html#inq. |
| Open Datasets | Yes | We compared Inq with other known weakly asymptotically optimal agents, Thompson sampling and Bayes Exp [Lattimore and Hutter, 2014a], in the grid-world environment using AIXIjs [Aslanides, 2017] which has previously been used to compare asymptotically optimal agents [Aslanides et al., 2017]. We tested in 10 10 grid-worlds, and 20 20 gridworlds, both with a single dispenser with probability of dispensing reward 0.75; that is, if the agent enters that cell, the probability of a reward of 1 is 0.75. |
| Dataset Splits | No | The paper describes experiments in simulated grid-world environments but does not specify traditional training, validation, or test dataset splits in terms of percentages or sample counts, as is common for fixed datasets. The experimental setup involves continuous interaction within the simulated environment. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'AIXIjs [Aslanides, 2017]' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | Following the conventions of [Aslanides et al., 2017] we averaged over 50 simulations, used discount factor γ = 0.99, 600 MCTS samples, and planning horizon of 6. We found that using small values for , specifically 1 worked well. For our experiments we chose = 1. |