Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Authors: Ju-Hyun Kim, Seungki Min
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method. We conduct two numerical experiments to evaluate our suggested algorithm (PCVAR) in a comparison with the other two competing algorithms GCVa R (Tamar et al., 2015b), and a na ıve version of PCVa R (NCVa R) that does not employ the predictive tail probabilities. |
| Researcher Affiliation | Academia | 1Department of Industrial and Systems Engineering, KAIST, Daejeon, South Korea. Correspondence to: Seungki Min <skmin@kaist.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 Predictive CVa R Policy Gradient |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Following Han et al. (2023), we consider the intraday pair trading of two stocks and the use of Tiingo dataset4, obtained through Tiingo End-Of-Day API: https://api.tiingo.com/documentation/iex |
| Dataset Splits | No | The paper mentions using "the first 330 days of data for the initial training of the trading strategy, and evaluate the strategy during the remaining days, while periodically re-optimizing it every other days using the prior ten days of data." It describes the data usage for training and evaluation but does not specify formal validation splits or percentages (e.g., 80/10/10 split). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, or memory) for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in its implementation, only general mentions like "Python" or "PyTorch" are missing. |
| Experiment Setup | Yes | θ learning rate (all): αθ = 0.005. η learning rate (NCVa R & PCVa R only): αη = 0.1. ϕ learning rate (PCVa R only): αϕ = 0.01. Batch size: B = 16. We introduce a prediction model with 12-dim parameters, ϕ = (ϕ1, ϕ2) R6 R6, such that f ϕ(x, c) = I{c < 0} Bϕ1 5 (x/21) + I{c 0} Bϕ2 5 (x/21), where Bϕ 5 ( ) is a Bernstein polynomial of degree 5 with coefficients ϕ, and use constant Lagrangian multipliers, λL = λM = 0.3. |