Distributional Reinforcement Learning for Risk-Sensitive Policies

Authors: Shiau Hong Lim, ILYAS MALIK

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVa R-optimized policies.
Researcher Affiliation Industry Shiau Hong Lim IBM Research, Singapore shonglim@sg.ibm.com Ilyas Malik IBM Research, Singapore malikilyas1996@gmail.com
Pseudocode Yes Algorithm 1 Policy execution for static CVa R for one episode...Algorithm 2 Quantile Regression Distributional Q-Learning for static CVa R
Open Source Code Yes Additional details and results, as well as the complete code to reproduce our results can be found in the supplementary material.
Open Datasets No The paper mentions using "actual daily closing prices for the top 10 Dow components from 2005 to 2019" and creating a "stock price simulator" based on cited work, but does not provide a direct link, DOI, or specific repository information for accessing the exact dataset used.
Dataset Splits No The paper specifies training and testing periods for the option trading task ("Prices from 2005-2015 are used for training and prices from 2016-2019 for testing."), but it does not explicitly mention a separate validation split or dataset.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or other computational resources used for experiments.
Software Dependencies Yes We use the implementation by Fujita et al. (2021) and made the slight modifications needed for Algorithms 1 and 2.
Experiment Setup Yes Unless otherwise stated, we implement Algorithm 1 and 2 and represent our policies using a neural network with two hidden layers, with Re LU activation. All our experiments use Adam as the stochastic gradient optimizer. For each action, the output consists of N = 100 quantile values. ... We use γ = 0.9 for this task.