reproducibilityindex.ai

Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

Authors: Ju-Hyun Kim, Seungki Min

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method. We conduct two numerical experiments to evaluate our suggested algorithm (PCVAR) in a comparison with the other two competing algorithms GCVa R (Tamar et al., 2015b), and a na ıve version of PCVa R (NCVa R) that does not employ the predictive tail probabilities.
Researcher Affiliation	Academia	1Department of Industrial and Systems Engineering, KAIST, Daejeon, South Korea. Correspondence to: Seungki Min <skmin@kaist.ac.kr>.
Pseudocode	Yes	Algorithm 1 Predictive CVa R Policy Gradient
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Following Han et al. (2023), we consider the intraday pair trading of two stocks and the use of Tiingo dataset4, obtained through Tiingo End-Of-Day API: https://api.tiingo.com/documentation/iex
Dataset Splits	No	The paper mentions using "the first 330 days of data for the initial training of the trading strategy, and evaluate the strategy during the remaining days, while periodically re-optimizing it every other days using the prior ten days of data." It describes the data usage for training and evaluation but does not specify formal validation splits or percentages (e.g., 80/10/10 split).
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types, or memory) for running its experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in its implementation, only general mentions like "Python" or "PyTorch" are missing.
Experiment Setup	Yes	θ learning rate (all): αθ = 0.005. η learning rate (NCVa R & PCVa R only): αη = 0.1. ϕ learning rate (PCVa R only): αϕ = 0.01. Batch size: B = 16. We introduce a prediction model with 12-dim parameters, ϕ = (ϕ1, ϕ2) R6 R6, such that f ϕ(x, c) = I{c < 0} Bϕ1 5 (x/21) + I{c 0} Bϕ2 5 (x/21), where Bϕ 5 ( ) is a Bernstein polynomial of degree 5 with coefficients ϕ, and use constant Lagrangian multipliers, λL = λM = 0.3.