Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

Authors: Ju-Hyun Kim, Seungki Min

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method. We conduct two numerical experiments to evaluate our suggested algorithm (PCVAR) in a comparison with the other two competing algorithms GCVa R (Tamar et al., 2015b), and a na ıve version of PCVa R (NCVa R) that does not employ the predictive tail probabilities.
Researcher Affiliation Academia 1Department of Industrial and Systems Engineering, KAIST, Daejeon, South Korea. Correspondence to: Seungki Min <skmin@kaist.ac.kr>.
Pseudocode Yes Algorithm 1 Predictive CVa R Policy Gradient
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Following Han et al. (2023), we consider the intraday pair trading of two stocks and the use of Tiingo dataset4, obtained through Tiingo End-Of-Day API: https://api.tiingo.com/documentation/iex
Dataset Splits No The paper mentions using "the first 330 days of data for the initial training of the trading strategy, and evaluate the strategy during the remaining days, while periodically re-optimizing it every other days using the prior ten days of data." It describes the data usage for training and evaluation but does not specify formal validation splits or percentages (e.g., 80/10/10 split).
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, or memory) for running its experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in its implementation, only general mentions like "Python" or "PyTorch" are missing.
Experiment Setup Yes θ learning rate (all): αθ = 0.005. η learning rate (NCVa R & PCVa R only): αη = 0.1. ϕ learning rate (PCVa R only): αϕ = 0.01. Batch size: B = 16. We introduce a prediction model with 12-dim parameters, ϕ = (ϕ1, ϕ2) R6 R6, such that f ϕ(x, c) = I{c < 0} Bϕ1 5 (x/21) + I{c 0} Bϕ2 5 (x/21), where Bϕ 5 ( ) is a Bernstein polynomial of degree 5 with coefficients ϕ, and use constant Lagrangian multipliers, λL = λM = 0.3.