reproducibilityindex.ai

State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning

Authors: Shuai Ma, Jia Yuan Yu4512-4519

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000, with the error region representing the standard deviations of the means along return axis.
Researcher Affiliation	Academia	Shuai Ma, Jia Yuan Yu Concordia Institute of Information System Engineering, Concordia University 1455 De Maisonneuve Blvd. W., Montreal, Quebec, Canada H3G 1M8 m shua@encs.concordia.ca, jiayuan.yu@concordia.ca
Pseudocode	Yes	Algorithm 1 State-Transition Transformation (for Case 0)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or direct links to a code repository.
Open Datasets	No	The paper constructs an MDP for a single-product stochastic inventory control problem based on (Puterman 1994, Section 3.2.1) and defines its parameters (W, c(x), m(x), M, f(x), probabilities), but this is a described problem setup rather than a publicly available dataset with concrete access information.
Dataset Splits	No	The paper mentions 'a simulation repeated 50 times with a time horizon 1000' but does not specify explicit training, validation, or test dataset splits or cross-validation details for a given dataset.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers).
Experiment Setup	Yes	We set the parameters as follows. The ﬁxed order cost W = 4, the variable order cost c(x) = 2x, the maintenance fee m(x) = x, the warehouse capacity M = 2, and the price f(x) = 8x. The probabilities of demands are P(Dt = 0) = 0.25, P(Dt = 1) = 0.5, P(Dt = 2) = 0.25 respectively. The initial distribution µ(0) = 1. [...] Now we set γ = 0.95 and compare the two return distributions [...]. The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000.