Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents

Authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
Researcher Affiliation Academia Woojun Kim1, , Yongjae Shin2, , Jongeui Park2, Youngchul Sung2 1Carnegie Mellon University 2KAIST woojunk@andrew.cmu.edu {yongjae.shin, jongeui.park, ycsung}@kaist.ac.kr
Pseudocode Yes The pseudo-code of the overall algorithm is provided in Appendix A.
Open Source Code No The paper does not contain an explicit statement or link providing access to the open-source code for the methodology described in the paper.
Open Datasets Yes Environments We consider both continuous and discrete tasks including Deep Mind Control Suite (DMC) [17], Minigrid[5], and Atari-100k[4] environments.
Dataset Splits No The paper describes using replay buffers and evaluating 'test return' but does not specify explicit training/validation/test *dataset* splits in the traditional sense, as it pertains to reinforcement learning environments rather than static datasets.
Hardware Specification No The paper does not specify the hardware used for the experiments, such as particular CPU or GPU models.
Software Dependencies No The paper mentions 'Our implementation is based on Stable Baseline3 [15]' but does not provide specific version numbers for Stable Baselines3 or other key software dependencies like Python or PyTorch.
Experiment Setup Yes Section 4.1 'Experimental Setup' details environments, baselines, DNN architectures, reset depth, and reset frequency. Additionally, Tables 1, 4, 5, and 6 provide extensive hyperparameters for various environments and algorithms (e.g., Training steps, Minibatch size, Optimizer, learning rate, Network hidden layers/units, Replay Buffer Size, etc.).