Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rationality, Optimism and Guarantees in General Reinforcement Learning

Authors: Peter Sunehag, Marcus Hutter

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this article,1 we present a top-down theoretical study of general reinforcement learning agents. We begin with rational agents with unlimited resources and then move to a setting where an agent can only maintain a limited number of hypotheses and optimizes plans over a horizon much shorter than what the agent designer actually wants. We axiomatize what is rational in such a setting in a manner that enables optimism, which is important to achieve systematic explorative behavior. Then, within the class of agents deemed rational, we achieve convergence and ﬁnite-error bounds.
Researcher Affiliation	Academia	Peter Sunehag EMAIL Marcus Hutter EMAIL Research School of Computer Science (RSISE BLD 115) The Australian National University, ACT 0200, Canberra Australia Editor: Laurent Orseau 1. The ﬁrst author is now at Google Deep Mind, London UK
Pseudocode	Yes	Algorithm 1: Optimistic-AIXI Agent (π ) Algorithm 2: Optimistic Agent (π ) for Deterministic Environments Algorithm 3: Optimistic Agent (π ) with Stochastic Finite Class Algorithm 4: Optimistic agent with hypothesis-generation from Lattimore et al. (2013a)
Open Source Code	No	The paper describes theoretical agents and algorithms with proofs and bounds, but does not provide any explicit statement about releasing source code or links to a code repository for the methodologies described.
Open Datasets	No	The paper is a theoretical study of reinforcement learning agents and does not present experimental results or use any specific datasets. Example 20 (Line environment) describes a conceptual environment used for illustration, not an empirical dataset.
Dataset Splits	No	The paper is a theoretical study and does not use any datasets for empirical evaluation. Therefore, there are no mentions of dataset splits.
Hardware Specification	No	The paper is a theoretical study focusing on mathematical proofs and algorithms for general reinforcement learning. It does not include any experimental section or details regarding hardware used for computational tasks.
Software Dependencies	No	The paper is theoretical and focuses on mathematical analysis and algorithm design. It does not mention any specific software dependencies with version numbers that would be required to reproduce experiments.
Experiment Setup	No	The paper is purely theoretical, presenting axioms, frameworks, and proofs for reinforcement learning agents. It does not contain an experimental section, and therefore, no specific experimental setup details or hyperparameters are provided.