Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Policy Gradient for Coherent Risk Measures
Authors: Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we illustrate our approach with a numerical example. We consider a trading agent that can invest in one of three assets (see Figure 1 for their distributions)... At each iteration, 10,000 samples were used for gradient estimation. |
| Researcher Affiliation | Collaboration | Aviv Tamar UC Berkeley EMAIL Yinlam Chow Stanford University EMAIL Mohammad Ghavamzadeh Adobe Research & INRIA EMAIL Shie Mannor Technion EMAIL |
| Pseudocode | No | The paper describes algorithmic steps but does not include structured pseudocode blocks or sections explicitly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | No | The paper does not provide a specific link to source code or explicitly state that the code for the methodology is being released or is available in supplementary materials. |
| Open Datasets | No | The paper describes a numerical illustration involving simulated asset returns with specified distributions (Normal and Pareto) but does not use a publicly available or open dataset with access information. |
| Dataset Splits | No | The paper describes training policies in a numerical illustration but does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or computational resources) used for running the numerical illustration or experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or specific solver versions) needed to replicate the experiment. |
| Experiment Setup | Yes | We consider a trading agent that can invest in one of three assets... The returns of the first two assets, A1 and A2, are normally distributed: A1 N(1, 1) and A2 N(4, 6). The return of the third asset A3 has a Pareto distribution: f(z) = α zα+1 z > 1, with α = 1.5... The agent selects an action randomly, with probability P(Ai) exp(θi), where θ R3 is the policy parameter. We trained three different policies π1, π2, and π3... At each iteration, 10,000 samples were used for gradient estimation. |