reproducibilityindex.ai

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Authors: Prashanth L.A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvari

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a trafﬁc signal control application. Figures 4(b) 4(d) present the histogram of the CPT-values from the test phase for AVG-SPSA, EUT-SPSA and CPTSPSA, respectively.
Researcher Affiliation	Academia	Prashanth L.A. PRASHLA@ISR.UMD.EDU Institute for Systems Research, University of Maryland Cheng Jie CJIE@MATH.UMD.EDU Department of Mathematics, University of Maryland Michael Fu MFU@ISR.UMD.EDU Robert H. Smith School of Business & Institute for Systems Research, University of Maryland Steve Marcus MARCUS@UMD.EDU Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland Csaba Szepesv ari SZEPESVA@CS.UALBERTA.CA Department of Computing Science, University of Alberta
Pseudocode	Yes	Algorithm 1 CPT-value estimation for H older continuous weights Algorithm 2 Structure of CPT-SPSA-G algorithm.
Open Source Code	Yes	The experiments are performed using the GLD trafﬁc simulator (Wiering et al., 2004) and the implementation is available at https://bitbucket.org/prashla/rl-gld.
Open Datasets	No	The paper mentions using the GLD trafﬁc simulator for generating data, but it does not specify a publicly available dataset with concrete access information (link, DOI, repository, or citation).
Dataset Splits	No	The paper mentions a "training phase" and a "test phase" but does not explicitly describe a validation set or any specific dataset split percentages or counts for training, validation, and test sets. It states: "a training phase where we run each algorithm for 200 iterations, with each iteration involving two perturbed simulations, each of trajectory length 500. This is followed by a test phase where we ﬁx the policy for each algorithm and 100 independent simulations of the MDP (each with a trajectory length of 1000) are performed."
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions the "GLD trafﬁc simulator (Wiering et al., 2004)" but does not provide specific ancillary software details with version numbers (e.g., libraries, frameworks, or programming language versions).
Experiment Setup	Yes	For both CPT-SPSA and EUT-SPSA, we set the utility functions (see (1)) as follows: u+(x) = \|x\|σ, and u (x) = λ\|x\|σ, where λ = 2.25 and σ = 0.88. For CPT-SPSA, we set the weights as follows: w+(p) = pη1 (pη1 + (1 p)η1)1/η1 , w (p) = pη2 (pη2 + (1 p)η2)1/η2 where η1 = 0.61 and η2 = 0.69. For all the algorithms, motivated by standard guidelines (see Spall 2005), we set δn = 1.9/n0.101 and an = 1/(n + 50). The initial point θ0 is the d-dimensional vector of ones and i, the operator Γi projects θi onto the set [0.1, 10.0]. The experiments involve two phases: ﬁrst, a training phase where we run each algorithm for 200 iterations, with each iteration involving two perturbed simulations, each of trajectory length 500.