Optimizing for the Future in Non-Stationary MDPs
Authors: Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip Thomas
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents empirical evaluations using several environments inspired by real-world applications that exhibit non-stationarity. |
| Researcher Affiliation | Collaboration | 1University of Massachusetts, MA, USA. 2Adobe Research, CA, USA. 3University of Alberta, AB, Canada. |
| Pseudocode | Yes | We provide a sketch of our proposed Prognosticator procedure for optimizing the future performance of the policy in Algorithm 1. |
| Open Source Code | Yes | Code for our algorithm can be obtained using the following link: https://github.com/yashchandak/OptFuture_NSMDP. |
| Open Datasets | Yes | This environment is based on an open-source implementation (Xie, 2019) of the FDA approved Type-1 Diabetes Mellitus simulator (T1DMS) (Man et al., 2014) for treatment of Type-1 Diabetes. |
| Dataset Splits | No | The paper mentions running multiple trials and hyper-parameter sweeps, but it does not explicitly state specific train/validation/test dataset splits or their sizes. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers required to replicate the experiments. |
| Experiment Setup | Yes | Input Learning-rate η, time-duration δ, entropy-regularizer λ (from Algorithm 1). In our experiments, we noticed that the proposed algorithm is particularly sensitive to the value of the entropy regularizer λ. |