Real-Time Symbolic Dynamic Programming

Authors: Luis Vianna, Leliane de Barros, Scott Sanner

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental RTSDP is empirically tested on three challenging domains: INVENTORY CONTROL, with continuously parametrized actions; RESERVOIR MANAGEMENT, with a nonlinear reward model; and TRAFFIC CONTROL, with nonlinear dynamics. Our results show that, given an initial state, RTSDP can solve finite-horizon HMDP problems faster and using far less memory than SDP.
Researcher Affiliation Academia Luis G. R. Vianna IME USP S ao Paulo, Brazil Leliane N. de Barros IME USP S ao Paulo, Brazil Scott Sanner NICTA & ANU Canberra, Australia
Pseudocode Yes Algorithm 1: SDP(HMDP M, H) ... Algorithm 2: RTDP(MDP M , s0, H, V ) ... Algorithm 3: Region-update(HMDP,( bc, xc),V , h)
Open Source Code No The paper does not explicitly state that the source code for the described methodology is released or provide a direct link to it.
Open Datasets No The paper describes the problem domains (RESERVOIR MANAGEMENT, INVENTORY CONTROL, TRAFFIC CONTROL) and refers to a wiki page for 'complete description of the problems used in this paper', but it does not specify or provide access information for any publicly available datasets used for training or evaluation.
Dataset Splits No The paper describes the problem domains and the evaluation process but does not specify training, validation, and test dataset splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper does not provide specific software dependencies or their version numbers used in the experiments.
Experiment Setup Yes For all of our experiments RTSDP value function was initialised with an admissible max cumulative reward heuristic, that is Vh(s) = h maxs,a R(s, a) s. This is a very simple to compute and minimally informative heuristic... (Figure 4 caption) with H = 4. ... (Figure 4 text) the initial state is (200, 200).