Tighter Value Function Bounds for Bayesian Reinforcement Learning
Authors: Kanghoon Lee, Kee-Eung Kim
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach. |
| Researcher Affiliation | Academia | Kanghoon Lee and Kee-Eung Kim Department of Computer Science Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea khlee@ai.kaist.ac.kr and kekim@cs.kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 AEMS-BRL Algorithm Algorithm 2 Expand( s, b ) Algorithm 3 Update Ancestor( s , b ) Algorithm 4 Online Initial Bound Computation |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Chain (Strens 2000) consists of a 5 state linear chain with 2 actions. Double-loop (Dearden, Friedman, and Russell 1998) consists of two loops of length 5 with a shared starting state (9 states total) and 2 actions. Grid5 (Guez, Silver, and Dayan 2012) is a 2D grid of 25 states and 4 directional movement actions. Grid10 (Guez, Silver, and Dayan 2012) is an enlarged version of Grid5 with 100 states. Maze (Dearden, Friedman, and Russell 1998), consisting of 264 states and 4 actions |
| Dataset Splits | No | The paper describes experimental setups in terms of number of runs and time steps (e.g., "500 runs of 1000 time steps"), but does not specify dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper mentions "CPU time (sec/step)" but does not specify any particular hardware components such as CPU models, GPU models, or memory. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | In all experiments, we set γ = 0.95 for the search and used simple Dirichlet-Multinomial model with symmetric Dirichlet parameter α0 = 1/|S| except for Double-loop in which we used parameter α0 = 1. For the online bound initialization, we set η = 40 and ηmin = 30 in all experiments. |