Optimizing for the Future in Non-Stationary MDPs

Authors: Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip Thomas

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents empirical evaluations using several environments inspired by real-world applications that exhibit non-stationarity.
Researcher Affiliation Collaboration 1University of Massachusetts, MA, USA. 2Adobe Research, CA, USA. 3University of Alberta, AB, Canada.
Pseudocode Yes We provide a sketch of our proposed Prognosticator procedure for optimizing the future performance of the policy in Algorithm 1.
Open Source Code Yes Code for our algorithm can be obtained using the following link: https://github.com/yashchandak/OptFuture_NSMDP.
Open Datasets Yes This environment is based on an open-source implementation (Xie, 2019) of the FDA approved Type-1 Diabetes Mellitus simulator (T1DMS) (Man et al., 2014) for treatment of Type-1 Diabetes.
Dataset Splits No The paper mentions running multiple trials and hyper-parameter sweeps, but it does not explicitly state specific train/validation/test dataset splits or their sizes.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with their version numbers required to replicate the experiments.
Experiment Setup Yes Input Learning-rate η, time-duration δ, entropy-regularizer λ (from Algorithm 1). In our experiments, we noticed that the proposed algorithm is particularly sensitive to the value of the entropy regularizer λ.