reproducibilityindex.ai

Reinforcement Learning with a Corrupted Reward Channel

Authors: Tom Everitt, Victoria Krakovna, Laurent Orseau, Shane Legg

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the results are illustrated with some simple experiments (Section 6). We illustrate the theoretical results with some simple experiments on a gridworld containing some goal tiles with true reward 0.9 (indicated by yellow circles) and a corrupt reward tile with observed reward 1 and true reward 0 (indicated by a blue square). Average observed and true rewards are shown in Figure 3.
Researcher Affiliation	Collaboration	Tom Everitt Australian National University tom4everitt@gmail.com Victoria Krakovna Deep Mind vkrakovna@google.com Laurent Orseau Deep Mind lorseau@google.com Shane Legg Deep Mind legg@google.com 0 Marcus Hutter (ANU) should be recognised as fourth author.
Pseudocode	No	The paper describes conceptual algorithms like Q-learning, softmax, and quantilising agents, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code for the methodology described, nor does it include a link to a code repository. It mentions using the AIXIjs framework, but not releasing their own implementation.
Open Datasets	No	The paper describes experiments conducted 'on a gridworld containing some goal tiles'. This appears to be a custom-built environment, and no concrete access information (link, DOI, formal citation, or repository) for a publicly available dataset is provided.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper states that the implementation was done in the 'AIXIjs framework for reinforcement learning', but it does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions implementation in the 'AIXIjs framework', but it does not specify any version numbers for this framework or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	The discounting factor is γ = 0.9. We run Q-learning with ϵ-greedy (ϵ = 0.1), softmax with temperature β = 2, and the quantilising agent with δ = 0.2, 0.5, 0.8 (where 0.8 = 1 p q/\|S\| = 1 p 1/25) for 100 runs with 1 million cycles.