reproducibilityindex.ai

A Generalized Reduced Linear Program for Markov Decision Processes

Authors: Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we generalize the RLP to deﬁne a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution of this paper is the novel theoretical framework developed to obtain error bounds for any given GRLP. Central to our framework are two max-norm contraction operators. Our result theoretically justiﬁes linear approximation of constraints. We discuss the implication of our results in the contexts of ADP and reinforcement learning. We also demonstrate via an example in the domain of controlled queues that the experiments conform to the theory.
Researcher Affiliation	Academia	Chandrashekar Lakshminarayanan and Shalabh Bhatnagar Department Computer Science and Automation Indian Institute of Science, Bangalore-560012, India {chandrul,shalabh}@csa.iisc.ernet.in
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides mathematical definitions and equations.
Open Source Code	No	The paper does not provide any concrete access to source code, nor does it explicitly state that code will be released or is available.
Open Datasets	No	The paper describes a 'queuing model' used for experiments, but this is a simulated environment with specified parameters, not a pre-existing, publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes a simulated queuing model but does not specify any train/test/validation dataset splits. The evaluation is based on errors computed from the model's output.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU/GPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., libraries, solvers, programming languages beyond general mention of 'LP' or 'stochastic approximation').
Experiment Setup	Yes	We take up an example in the domain of controlled queues to show that experiments are in agreement with the theory developed. More speciﬁcally, we look at the error bounds for different constraints reduction schemes to demonstrate the fact that whenever value of \|\|Γ J Γ J\|\| is less, the GRLP solution is close to the optimal value function. The queuing system consists of n = 104 states and d = 4 actions. [...] We chose n = 104 because it was possible to solve both the GRLP and the exact LP (albeit with signiﬁcant effort) so as to enumerate the approximation errors. [...] Accordingly, we ﬁrst chose a smaller system, QS, with n = 10, d = 2, k = 2, m = 5, q(1) = 0.2, q(2) = 0.4, p = 0.2 and α = 0.98. [...] Next we consider a moderately large queuing system QL with n = 104 and d = 4 with q(1) = 0.2, q(2) = 0.4, q(3) = 0.6, q(4) = 0.8, p = 0.4 and α = 0.98. In the case of QL, we chose k = 4 (i.e., we used 1, s, s2 and s3 as basis vectors) and we chose Wa (14), Wc, Wi and Wr with m = 50. We set c(s) = (1 ζ)ζs, s = 1, . . . , 9999, with ζ = 0.9 and ζ = 0.999 respectively.