Policy Gradient for Coherent Risk Measures
Authors: Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we illustrate our approach with a numerical example. We consider a trading agent that can invest in one of three assets (see Figure 1 for their distributions)... At each iteration, 10,000 samples were used for gradient estimation. |
| Researcher Affiliation | Collaboration | Aviv Tamar UC Berkeley avivt@berkeley.edu Yinlam Chow Stanford University ychow@stanford.edu Mohammad Ghavamzadeh Adobe Research & INRIA mohammad.ghavamzadeh@inria.fr Shie Mannor Technion shie@ee.technion.ac.il |
| Pseudocode | No | The paper describes algorithmic steps but does not include structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide a specific link to source code or explicitly state that the code for the methodology is being released or is available in supplementary materials. |
| Open Datasets | No | The paper describes a numerical illustration involving simulated asset returns with specified distributions (Normal and Pareto) but does not use a publicly available or open dataset with access information. |
| Dataset Splits | No | The paper describes training policies in a numerical illustration but does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running the numerical illustration or experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or specific solver versions) needed to replicate the experiment. |
| Experiment Setup | Yes | We consider a trading agent that can invest in one of three assets... The returns of the first two assets, A1 and A2, are normally distributed: A1 N(1, 1) and A2 N(4, 6). The return of the third asset A3 has a Pareto distribution: f(z) = α zα+1 z > 1, with α = 1.5... The agent selects an action randomly, with probability P(Ai) exp(θi), where θ R3 is the policy parameter. We trained three different policies π1, π2, and π3... At each iteration, 10,000 samples were used for gradient estimation. |