reproducibilityindex.ai

Optimizing for the Future in Non-Stationary MDPs

Authors: Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip Thomas

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section presents empirical evaluations using several environments inspired by real-world applications that exhibit non-stationarity.
Researcher Affiliation	Collaboration	1University of Massachusetts, MA, USA. 2Adobe Research, CA, USA. 3University of Alberta, AB, Canada.
Pseudocode	Yes	We provide a sketch of our proposed Prognosticator procedure for optimizing the future performance of the policy in Algorithm 1.
Open Source Code	Yes	Code for our algorithm can be obtained using the following link: https://github.com/yashchandak/OptFuture_NSMDP.
Open Datasets	Yes	This environment is based on an open-source implementation (Xie, 2019) of the FDA approved Type-1 Diabetes Mellitus simulator (T1DMS) (Man et al., 2014) for treatment of Type-1 Diabetes.
Dataset Splits	No	The paper mentions running multiple trials and hyper-parameter sweeps, but it does not explicitly state specific train/validation/test dataset splits or their sizes.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers required to replicate the experiments.
Experiment Setup	Yes	Input Learning-rate η, time-duration δ, entropy-regularizer λ (from Algorithm 1). In our experiments, we noticed that the proposed algorithm is particularly sensitive to the value of the entropy regularizer λ.