Safe Reinforcement Learning via Curriculum Induction
Authors: Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments use this framework in two challenging environments to induce curricula for safe and efficient learning. |
| Researcher Affiliation | Collaboration | Matteo Turchetta Department of Computer Science ETH Zurich matteotu@inf.ethz.ch Andrey Kolobov Microsoft Research Redmond, WA-998052 akolobov@microsoft.com Shital Shah Microsoft Research Redmond, WA-998052 shitals@microsoft.com Andreas Krause Department of Computer Science ETH Zurich krausea@ethz.ch Alekh Agarwal Microsoft Research Redmond, WA-998052 alekha@microsoft.com |
| Pseudocode | Yes | Algorithm 1 CISR |
| Open Source Code | Yes | We release an open source implementation of CISR and of our experiments2. 2https://github.com/zuzuba/CISR_NeurIPS20 |
| Open Datasets | Yes | Frozen Lake and the Lunar Lander environments from Open AI Gym [10]. |
| Dataset Splits | No | No. The paper describes the training process in RL environments (Frozen Lake and Lunar Lander) where data is generated through interaction, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits. It mentions evaluating policies in the original environment but not how a static dataset would be split for validation. |
| Hardware Specification | No | No. The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | No. The paper mentions using 'Stable Baselines [25] implementation of PPO [43]' and 'GP-UCB [44]' for optimization, but it does not provide specific version numbers for these software components or other dependencies. |
| Experiment Setup | Yes | For a detailed overview of the hyperparameters and the environments, see Appendices A and B. |