A Generalized Reduced Linear Program for Markov Decision Processes

Authors: Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we generalize the RLP to define a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution of this paper is the novel theoretical framework developed to obtain error bounds for any given GRLP. Central to our framework are two max-norm contraction operators. Our result theoretically justifies linear approximation of constraints. We discuss the implication of our results in the contexts of ADP and reinforcement learning. We also demonstrate via an example in the domain of controlled queues that the experiments conform to the theory.
Researcher Affiliation Academia Chandrashekar Lakshminarayanan and Shalabh Bhatnagar Department Computer Science and Automation Indian Institute of Science, Bangalore-560012, India {chandrul,shalabh}@csa.iisc.ernet.in
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides mathematical definitions and equations.
Open Source Code No The paper does not provide any concrete access to source code, nor does it explicitly state that code will be released or is available.
Open Datasets No The paper describes a 'queuing model' used for experiments, but this is a simulated environment with specified parameters, not a pre-existing, publicly available dataset with concrete access information.
Dataset Splits No The paper describes a simulated queuing model but does not specify any train/test/validation dataset splits. The evaluation is based on errors computed from the model's output.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., libraries, solvers, programming languages beyond general mention of 'LP' or 'stochastic approximation').
Experiment Setup Yes We take up an example in the domain of controlled queues to show that experiments are in agreement with the theory developed. More specifically, we look at the error bounds for different constraints reduction schemes to demonstrate the fact that whenever value of ||Γ J Γ J|| is less, the GRLP solution is close to the optimal value function. The queuing system consists of n = 104 states and d = 4 actions. [...] We chose n = 104 because it was possible to solve both the GRLP and the exact LP (albeit with significant effort) so as to enumerate the approximation errors. [...] Accordingly, we first chose a smaller system, QS, with n = 10, d = 2, k = 2, m = 5, q(1) = 0.2, q(2) = 0.4, p = 0.2 and α = 0.98. [...] Next we consider a moderately large queuing system QL with n = 104 and d = 4 with q(1) = 0.2, q(2) = 0.4, q(3) = 0.6, q(4) = 0.8, p = 0.4 and α = 0.98. In the case of QL, we chose k = 4 (i.e., we used 1, s, s2 and s3 as basis vectors) and we chose Wa (14), Wc, Wi and Wr with m = 50. We set c(s) = (1 ζ)ζs, s = 1, . . . , 9999, with ζ = 0.9 and ζ = 0.999 respectively.