Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
Authors: Pinar Ozisik, Philip S. Thomas
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new algorithm that is more robust against data corruptions, and demonstrate its usage in practice on some RL problems, including a grid-world and a diabetes treatment simulation. ... Finally, we present experiments that support our theoretical analysis. |
| Researcher Affiliation | Academia | A. Pinar Ozisik1 Philip S. Thomas1 1College of Information and Computer Sciences University of Massachusetts {pinar, pthomas}@cs.umass.edu |
| Pseudocode | Yes | For pseudocode, refer to Algorithm 1 in Appendix C. |
| Open Source Code | No | The paper refers to a third-party simulator's code ('Simglucose v0.2.1 (2018), 2019. URL https://github.com/jxx123/simglucose.'), but it does not provide explicit statements or links for the authors' own source code for the methodology or algorithm described in the paper. |
| Open Datasets | Yes | For the diabetes treatment simulation, we use a Python implementation [47] of an FDA-approved type 1 diabetes Mellitus simulator (T1DMS) by Kovatchev et al. [20] and Man et al. [28]. |
| Dataset Splits | No | The paper mentions creating a dataset of a specific size ('to a randomly created D of size 1,500') but does not provide explicit details on train, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | Yes | For the diabetes treatment simulation, we use a Python implementation [47] of an FDA-approved type 1 diabetes Mellitus simulator (T1DMS) by Kovatchev et al. [20] and Man et al. [28]. [47] Jinyu Xie. Simglucose v0.2.1 (2018), 2019. URL https://github.com/jxx123/simglucose. |
| Experiment Setup | Yes | For our experimental setup, we selected two policies per domain. We estimated J(πb) 0.797 and J(πe) 0.728 for grid-world, and J(πb) 0.218 and J(πe) 0.145 for the diabetes domain, by averaging returns obtained from running each policy 10,000 times. We added k adversarial trajectories based on the optimal attacker strategy to a randomly created D of size 1,500. ... Figure 1 shows the average LCH, over 750 trials, as k increases. |