reproducibilityindex.ai

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our framework empirically in both tabular and continuous domains (Section 7).7 Empirical evaluation We now empirically study our framework, and examine the phenomena discussed in the previous sections. We focus on two sets of experiments: the first is in tabular settings where we use dynamic programming methods to perform an analysis without the noise of gradient-based learning. The second builds upon Lim & Malik (2022), where we augment their model-free algorithm with our framework, and evaluate it on an option trading environment. We discuss training and environments details in Appendix E, and provide additional results in Appendix A.3.
Researcher Affiliation	Academia	Tyler Kastner University of Toronto, Vector Institute Murat A. Erdogdu University of Toronto, Vector Institute Amir-massoud Farahmand Vector Institute, University of Toronto
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the code used to run our experiments at github.com/tylerkastner/distribution-equivalence.
Open Datasets	Yes	We adapt the stochastic four rooms domain used in Grimm et al. (2021)... We consider the stochastic adaptation of the cliff walk environment (Sutton & Barto, 2018) as introduced in Bellemare et al. (2023)... We use the 8 by 8 frozen lake domain as specified in Brockman et al. (2016).
Dataset Splits	No	The paper mentions training and evaluation but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning. For example, it does not state what percentage of data was used for training versus validation.
Hardware Specification	Yes	For the tabular experiments, each model took roughly 1 hour to train on a single CPU... For the option trading experiments, training a policy for a given CVa R level took roughly 40 minutes on a single Tesla P100 GPU on average...
Software Dependencies	No	The paper mentions using specific algorithms and methods (e.g., QR-DQN, C51) and refers to code being provided on GitHub, but it does not specify software dependencies like programming language versions or library versions (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	No	The paper states it discusses training details in Appendix E, but Appendix E primarily describes the environments and compute infrastructure. It does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or optimizer settings.