Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rationality, Optimism and Guarantees in General Reinforcement Learning

Authors: Peter Sunehag, Marcus Hutter

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this article,1 we present a top-down theoretical study of general reinforcement learning agents. We begin with rational agents with unlimited resources and then move to a setting where an agent can only maintain a limited number of hypotheses and optimizes plans over a horizon much shorter than what the agent designer actually wants. We axiomatize what is rational in such a setting in a manner that enables optimism, which is important to achieve systematic explorative behavior. Then, within the class of agents deemed rational, we achieve convergence and ๏ฌnite-error bounds.
Researcher Affiliation Academia Peter Sunehag EMAIL Marcus Hutter EMAIL Research School of Computer Science (RSISE BLD 115) The Australian National University, ACT 0200, Canberra Australia Editor: Laurent Orseau 1. The ๏ฌrst author is now at Google Deep Mind, London UK
Pseudocode Yes Algorithm 1: Optimistic-AIXI Agent (ฯ€ ) Algorithm 2: Optimistic Agent (ฯ€ ) for Deterministic Environments Algorithm 3: Optimistic Agent (ฯ€ ) with Stochastic Finite Class Algorithm 4: Optimistic agent with hypothesis-generation from Lattimore et al. (2013a)
Open Source Code No The paper describes theoretical agents and algorithms with proofs and bounds, but does not provide any explicit statement about releasing source code or links to a code repository for the methodologies described.
Open Datasets No The paper is a theoretical study of reinforcement learning agents and does not present experimental results or use any specific datasets. Example 20 (Line environment) describes a conceptual environment used for illustration, not an empirical dataset.
Dataset Splits No The paper is a theoretical study and does not use any datasets for empirical evaluation. Therefore, there are no mentions of dataset splits.
Hardware Specification No The paper is a theoretical study focusing on mathematical proofs and algorithms for general reinforcement learning. It does not include any experimental section or details regarding hardware used for computational tasks.
Software Dependencies No The paper is theoretical and focuses on mathematical analysis and algorithm design. It does not mention any specific software dependencies with version numbers that would be required to reproduce experiments.
Experiment Setup No The paper is purely theoretical, presenting axioms, frameworks, and proofs for reinforcement learning agents. It does not contain an experimental section, and therefore, no specific experimental setup details or hyperparameters are provided.