Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents
Authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations. |
| Researcher Affiliation | Academia | Woojun Kim1, , Yongjae Shin2, , Jongeui Park2, Youngchul Sung2 1Carnegie Mellon University 2KAIST woojunk@andrew.cmu.edu {yongjae.shin, jongeui.park, ycsung}@kaist.ac.kr |
| Pseudocode | Yes | The pseudo-code of the overall algorithm is provided in Appendix A. |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the open-source code for the methodology described in the paper. |
| Open Datasets | Yes | Environments We consider both continuous and discrete tasks including Deep Mind Control Suite (DMC) [17], Minigrid[5], and Atari-100k[4] environments. |
| Dataset Splits | No | The paper describes using replay buffers and evaluating 'test return' but does not specify explicit training/validation/test *dataset* splits in the traditional sense, as it pertains to reinforcement learning environments rather than static datasets. |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments, such as particular CPU or GPU models. |
| Software Dependencies | No | The paper mentions 'Our implementation is based on Stable Baseline3 [15]' but does not provide specific version numbers for Stable Baselines3 or other key software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | Section 4.1 'Experimental Setup' details environments, baselines, DNN architectures, reset depth, and reset frequency. Additionally, Tables 1, 4, 5, and 6 provide extensive hyperparameters for various environments and algorithms (e.g., Training steps, Minibatch size, Optimizer, learning rate, Network hidden layers/units, Replay Buffer Size, etc.). |