reproducibilityindex.ai

Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents

Authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
Researcher Affiliation	Academia	Woojun Kim1, , Yongjae Shin2, , Jongeui Park2, Youngchul Sung2 1Carnegie Mellon University 2KAIST woojunk@andrew.cmu.edu {yongjae.shin, jongeui.park, ycsung}@kaist.ac.kr
Pseudocode	Yes	The pseudo-code of the overall algorithm is provided in Appendix A.
Open Source Code	No	The paper does not contain an explicit statement or link providing access to the open-source code for the methodology described in the paper.
Open Datasets	Yes	Environments We consider both continuous and discrete tasks including Deep Mind Control Suite (DMC) [17], Minigrid[5], and Atari-100k[4] environments.
Dataset Splits	No	The paper describes using replay buffers and evaluating 'test return' but does not specify explicit training/validation/test dataset splits in the traditional sense, as it pertains to reinforcement learning environments rather than static datasets.
Hardware Specification	No	The paper does not specify the hardware used for the experiments, such as particular CPU or GPU models.
Software Dependencies	No	The paper mentions 'Our implementation is based on Stable Baseline3 [15]' but does not provide specific version numbers for Stable Baselines3 or other key software dependencies like Python or PyTorch.
Experiment Setup	Yes	Section 4.1 'Experimental Setup' details environments, baselines, DNN architectures, reset depth, and reset frequency. Additionally, Tables 1, 4, 5, and 6 provide extensive hyperparameters for various environments and algorithms (e.g., Training steps, Minibatch size, Optimizer, learning rate, Network hidden layers/units, Replay Buffer Size, etc.).