reproducibilityindex.ai

Policy Gradient for Coherent Risk Measures

Authors: Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we illustrate our approach with a numerical example. We consider a trading agent that can invest in one of three assets (see Figure 1 for their distributions)... At each iteration, 10,000 samples were used for gradient estimation.
Researcher Affiliation	Collaboration	Aviv Tamar UC Berkeley avivt@berkeley.edu Yinlam Chow Stanford University ychow@stanford.edu Mohammad Ghavamzadeh Adobe Research & INRIA mohammad.ghavamzadeh@inria.fr Shie Mannor Technion shie@ee.technion.ac.il
Pseudocode	No	The paper describes algorithmic steps but does not include structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper does not provide a specific link to source code or explicitly state that the code for the methodology is being released or is available in supplementary materials.
Open Datasets	No	The paper describes a numerical illustration involving simulated asset returns with specified distributions (Normal and Pareto) but does not use a publicly available or open dataset with access information.
Dataset Splits	No	The paper describes training policies in a numerical illustration but does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running the numerical illustration or experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or specific solver versions) needed to replicate the experiment.
Experiment Setup	Yes	We consider a trading agent that can invest in one of three assets... The returns of the ﬁrst two assets, A1 and A2, are normally distributed: A1 N(1, 1) and A2 N(4, 6). The return of the third asset A3 has a Pareto distribution: f(z) = α zα+1 z > 1, with α = 1.5... The agent selects an action randomly, with probability P(Ai) exp(θi), where θ R3 is the policy parameter. We trained three different policies π1, π2, and π3... At each iteration, 10,000 samples were used for gradient estimation.