Policy Gradient for Coherent Risk Measures

Authors: Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we illustrate our approach with a numerical example. We consider a trading agent that can invest in one of three assets (see Figure 1 for their distributions)... At each iteration, 10,000 samples were used for gradient estimation.
Researcher Affiliation Collaboration Aviv Tamar UC Berkeley avivt@berkeley.edu Yinlam Chow Stanford University ychow@stanford.edu Mohammad Ghavamzadeh Adobe Research & INRIA mohammad.ghavamzadeh@inria.fr Shie Mannor Technion shie@ee.technion.ac.il
Pseudocode No The paper describes algorithmic steps but does not include structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code No The paper does not provide a specific link to source code or explicitly state that the code for the methodology is being released or is available in supplementary materials.
Open Datasets No The paper describes a numerical illustration involving simulated asset returns with specified distributions (Normal and Pareto) but does not use a publicly available or open dataset with access information.
Dataset Splits No The paper describes training policies in a numerical illustration but does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running the numerical illustration or experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or specific solver versions) needed to replicate the experiment.
Experiment Setup Yes We consider a trading agent that can invest in one of three assets... The returns of the first two assets, A1 and A2, are normally distributed: A1 N(1, 1) and A2 N(4, 6). The return of the third asset A3 has a Pareto distribution: f(z) = α zα+1 z > 1, with α = 1.5... The agent selects an action randomly, with probability P(Ai) exp(θi), where θ R3 is the policy parameter. We trained three different policies π1, π2, and π3... At each iteration, 10,000 samples were used for gradient estimation.