Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributional Reinforcement Learning for Risk-Sensitive Policies
Authors: Shiau Hong Lim, ILYAS MALIK
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVa R-optimized policies. |
| Researcher Affiliation | Industry | Shiau Hong Lim IBM Research, Singapore EMAIL Ilyas Malik IBM Research, Singapore EMAIL |
| Pseudocode | Yes | Algorithm 1 Policy execution for static CVa R for one episode...Algorithm 2 Quantile Regression Distributional Q-Learning for static CVa R |
| Open Source Code | Yes | Additional details and results, as well as the complete code to reproduce our results can be found in the supplementary material. |
| Open Datasets | No | The paper mentions using "actual daily closing prices for the top 10 Dow components from 2005 to 2019" and creating a "stock price simulator" based on cited work, but does not provide a direct link, DOI, or specific repository information for accessing the exact dataset used. |
| Dataset Splits | No | The paper specifies training and testing periods for the option trading task ("Prices from 2005-2015 are used for training and prices from 2016-2019 for testing."), but it does not explicitly mention a separate validation split or dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or other computational resources used for experiments. |
| Software Dependencies | Yes | We use the implementation by Fujita et al. (2021) and made the slight modifications needed for Algorithms 1 and 2. |
| Experiment Setup | Yes | Unless otherwise stated, we implement Algorithm 1 and 2 and represent our policies using a neural network with two hidden layers, with Re LU activation. All our experiments use Adam as the stochastic gradient optimizer. For each action, the output consists of N = 100 quantile values. ... We use γ = 0.9 for this task. |