Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning
Authors: Shuai Ma, Jia Yuan Yu4512-4519
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000, with the error region representing the standard deviations of the means along return axis. |
| Researcher Affiliation | Academia | Shuai Ma, Jia Yuan Yu Concordia Institute of Information System Engineering, Concordia University 1455 De Maisonneuve Blvd. W., Montreal, Quebec, Canada H3G 1M8 m EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 State-Transition Transformation (for Case 0) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or direct links to a code repository. |
| Open Datasets | No | The paper constructs an MDP for a single-product stochastic inventory control problem based on (Puterman 1994, Section 3.2.1) and defines its parameters (W, c(x), m(x), M, f(x), probabilities), but this is a described problem setup rather than a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper mentions 'a simulation repeated 50 times with a time horizon 1000' but does not specify explicit training, validation, or test dataset splits or cross-validation details for a given dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific solvers). |
| Experiment Setup | Yes | We set the parameters as follows. The ๏ฌxed order cost W = 4, the variable order cost c(x) = 2x, the maintenance fee m(x) = x, the warehouse capacity M = 2, and the price f(x) = 8x. The probabilities of demands are P(Dt = 0) = 0.25, P(Dt = 1) = 0.5, P(Dt = 2) = 0.25 respectively. The initial distribution ยต(0) = 1. [...] Now we set ฮณ = 0.95 and compare the two return distributions [...]. The averaged empirical return distribution is from a simulation repeated 50 times with a time horizon 1000. |