Distributional Reinforcement Learning for Risk-Sensitive Policies
Authors: Shiau Hong Lim, ILYAS MALIK
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVa R-optimized policies. |
| Researcher Affiliation | Industry | Shiau Hong Lim IBM Research, Singapore shonglim@sg.ibm.com Ilyas Malik IBM Research, Singapore malikilyas1996@gmail.com |
| Pseudocode | Yes | Algorithm 1 Policy execution for static CVa R for one episode...Algorithm 2 Quantile Regression Distributional Q-Learning for static CVa R |
| Open Source Code | Yes | Additional details and results, as well as the complete code to reproduce our results can be found in the supplementary material. |
| Open Datasets | No | The paper mentions using "actual daily closing prices for the top 10 Dow components from 2005 to 2019" and creating a "stock price simulator" based on cited work, but does not provide a direct link, DOI, or specific repository information for accessing the exact dataset used. |
| Dataset Splits | No | The paper specifies training and testing periods for the option trading task ("Prices from 2005-2015 are used for training and prices from 2016-2019 for testing."), but it does not explicitly mention a separate validation split or dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models, memory, or other computational resources used for experiments. |
| Software Dependencies | Yes | We use the implementation by Fujita et al. (2021) and made the slight modifications needed for Algorithms 1 and 2. |
| Experiment Setup | Yes | Unless otherwise stated, we implement Algorithm 1 and 2 and represent our policies using a neural network with two hidden layers, with Re LU activation. All our experiments use Adam as the stochastic gradient optimizer. For each action, the output consists of N = 100 quantile values. ... We use γ = 0.9 for this task. |