Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tighter Value Function Bounds for Bayesian Reinforcement Learning

Authors: Kanghoon Lee, Kee-Eung Kim

AAAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach.
Researcher Affiliation	Academia	Kanghoon Lee and Kee-Eung Kim Department of Computer Science Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea EMAIL and EMAIL
Pseudocode	Yes	Algorithm 1 AEMS-BRL Algorithm Algorithm 2 Expand( s, b ) Algorithm 3 Update Ancestor( s , b ) Algorithm 4 Online Initial Bound Computation
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	Chain (Strens 2000) consists of a 5 state linear chain with 2 actions. Double-loop (Dearden, Friedman, and Russell 1998) consists of two loops of length 5 with a shared starting state (9 states total) and 2 actions. Grid5 (Guez, Silver, and Dayan 2012) is a 2D grid of 25 states and 4 directional movement actions. Grid10 (Guez, Silver, and Dayan 2012) is an enlarged version of Grid5 with 100 states. Maze (Dearden, Friedman, and Russell 1998), consisting of 264 states and 4 actions
Dataset Splits	No	The paper describes experimental setups in terms of number of runs and time steps (e.g., "500 runs of 1000 time steps"), but does not specify dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper mentions "CPU time (sec/step)" but does not specify any particular hardware components such as CPU models, GPU models, or memory.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	In all experiments, we set γ = 0.95 for the search and used simple Dirichlet-Multinomial model with symmetric Dirichlet parameter α0 = 1/\|S\| except for Double-loop in which we used parameter α0 = 1. For the online bound initialization, we set η = 40 and ηmin = 30 in all experiments.