reproducibilityindex.ai

Distributional Reinforcement Learning for Risk-Sensitive Policies

Authors: Shiau Hong Lim, ILYAS MALIK

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVa R-optimized policies.
Researcher Affiliation	Industry	Shiau Hong Lim IBM Research, Singapore shonglim@sg.ibm.com Ilyas Malik IBM Research, Singapore malikilyas1996@gmail.com
Pseudocode	Yes	Algorithm 1 Policy execution for static CVa R for one episode...Algorithm 2 Quantile Regression Distributional Q-Learning for static CVa R
Open Source Code	Yes	Additional details and results, as well as the complete code to reproduce our results can be found in the supplementary material.
Open Datasets	No	The paper mentions using "actual daily closing prices for the top 10 Dow components from 2005 to 2019" and creating a "stock price simulator" based on cited work, but does not provide a direct link, DOI, or specific repository information for accessing the exact dataset used.
Dataset Splits	No	The paper specifies training and testing periods for the option trading task ("Prices from 2005-2015 are used for training and prices from 2016-2019 for testing."), but it does not explicitly mention a separate validation split or dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or other computational resources used for experiments.
Software Dependencies	Yes	We use the implementation by Fujita et al. (2021) and made the slight modiﬁcations needed for Algorithms 1 and 2.
Experiment Setup	Yes	Unless otherwise stated, we implement Algorithm 1 and 2 and represent our policies using a neural network with two hidden layers, with Re LU activation. All our experiments use Adam as the stochastic gradient optimizer. For each action, the output consists of N = 100 quantile values. ... We use γ = 0.9 for this task.