Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
Authors: Prashanth L.A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvari
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a traffic signal control application. Figures 4(b) 4(d) present the histogram of the CPT-values from the test phase for AVG-SPSA, EUT-SPSA and CPTSPSA, respectively. |
| Researcher Affiliation | Academia | Prashanth L.A. PRASHLA@ISR.UMD.EDU Institute for Systems Research, University of Maryland Cheng Jie CJIE@MATH.UMD.EDU Department of Mathematics, University of Maryland Michael Fu MFU@ISR.UMD.EDU Robert H. Smith School of Business & Institute for Systems Research, University of Maryland Steve Marcus MARCUS@UMD.EDU Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland Csaba Szepesv ari SZEPESVA@CS.UALBERTA.CA Department of Computing Science, University of Alberta |
| Pseudocode | Yes | Algorithm 1 CPT-value estimation for H older continuous weights Algorithm 2 Structure of CPT-SPSA-G algorithm. |
| Open Source Code | Yes | The experiments are performed using the GLD traffic simulator (Wiering et al., 2004) and the implementation is available at https://bitbucket.org/prashla/rl-gld. |
| Open Datasets | No | The paper mentions using the GLD traffic simulator for generating data, but it does not specify a publicly available dataset with concrete access information (link, DOI, repository, or citation). |
| Dataset Splits | No | The paper mentions a "training phase" and a "test phase" but does not explicitly describe a validation set or any specific dataset split percentages or counts for training, validation, and test sets. It states: "a training phase where we run each algorithm for 200 iterations, with each iteration involving two perturbed simulations, each of trajectory length 500. This is followed by a test phase where we fix the policy for each algorithm and 100 independent simulations of the MDP (each with a trajectory length of 1000) are performed." |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions the "GLD traffic simulator (Wiering et al., 2004)" but does not provide specific ancillary software details with version numbers (e.g., libraries, frameworks, or programming language versions). |
| Experiment Setup | Yes | For both CPT-SPSA and EUT-SPSA, we set the utility functions (see (1)) as follows: u+(x) = |x|σ, and u (x) = λ|x|σ, where λ = 2.25 and σ = 0.88. For CPT-SPSA, we set the weights as follows: w+(p) = pη1 (pη1 + (1 p)η1)1/η1 , w (p) = pη2 (pη2 + (1 p)η2)1/η2 where η1 = 0.61 and η2 = 0.69. For all the algorithms, motivated by standard guidelines (see Spall 2005), we set δn = 1.9/n0.101 and an = 1/(n + 50). The initial point θ0 is the d-dimensional vector of ones and i, the operator Γi projects θi onto the set [0.1, 10.0]. The experiments involve two phases: first, a training phase where we run each algorithm for 200 iterations, with each iteration involving two perturbed simulations, each of trajectory length 500. |