Real-Time Symbolic Dynamic Programming
Authors: Luis Vianna, Leliane de Barros, Scott Sanner
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | RTSDP is empirically tested on three challenging domains: INVENTORY CONTROL, with continuously parametrized actions; RESERVOIR MANAGEMENT, with a nonlinear reward model; and TRAFFIC CONTROL, with nonlinear dynamics. Our results show that, given an initial state, RTSDP can solve finite-horizon HMDP problems faster and using far less memory than SDP. |
| Researcher Affiliation | Academia | Luis G. R. Vianna IME USP S ao Paulo, Brazil Leliane N. de Barros IME USP S ao Paulo, Brazil Scott Sanner NICTA & ANU Canberra, Australia |
| Pseudocode | Yes | Algorithm 1: SDP(HMDP M, H) ... Algorithm 2: RTDP(MDP M , s0, H, V ) ... Algorithm 3: Region-update(HMDP,( bc, xc),V , h) |
| Open Source Code | No | The paper does not explicitly state that the source code for the described methodology is released or provide a direct link to it. |
| Open Datasets | No | The paper describes the problem domains (RESERVOIR MANAGEMENT, INVENTORY CONTROL, TRAFFIC CONTROL) and refers to a wiki page for 'complete description of the problems used in this paper', but it does not specify or provide access information for any publicly available datasets used for training or evaluation. |
| Dataset Splits | No | The paper describes the problem domains and the evaluation process but does not specify training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers used in the experiments. |
| Experiment Setup | Yes | For all of our experiments RTSDP value function was initialised with an admissible max cumulative reward heuristic, that is Vh(s) = h maxs,a R(s, a) s. This is a very simple to compute and minimally informative heuristic... (Figure 4 caption) with H = 4. ... (Figure 4 text) the initial state is (200, 200). |