State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning
Authors: Shuai Ma, Jia Yuan Yu4512-4519
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000, with the error region representing the standard deviations of the means along return axis. |
| Researcher Affiliation | Academia | Shuai Ma, Jia Yuan Yu Concordia Institute of Information System Engineering, Concordia University 1455 De Maisonneuve Blvd. W., Montreal, Quebec, Canada H3G 1M8 m shua@encs.concordia.ca, jiayuan.yu@concordia.ca |
| Pseudocode | Yes | Algorithm 1 State-Transition Transformation (for Case 0) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or direct links to a code repository. |
| Open Datasets | No | The paper constructs an MDP for a single-product stochastic inventory control problem based on (Puterman 1994, Section 3.2.1) and defines its parameters (W, c(x), m(x), M, f(x), probabilities), but this is a described problem setup rather than a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper mentions 'a simulation repeated 50 times with a time horizon 1000' but does not specify explicit training, validation, or test dataset splits or cross-validation details for a given dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers). |
| Experiment Setup | Yes | We set the parameters as follows. The fixed order cost W = 4, the variable order cost c(x) = 2x, the maintenance fee m(x) = x, the warehouse capacity M = 2, and the price f(x) = 8x. The probabilities of demands are P(Dt = 0) = 0.25, P(Dt = 1) = 0.5, P(Dt = 2) = 0.25 respectively. The initial distribution µ(0) = 1. [...] Now we set γ = 0.95 and compare the two return distributions [...]. The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000. |