reproducibilityindex.ai

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Authors: Su Young Lee, Choi Sungik, Sae-Young Chung

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically prove the convergence of the EBU method and experimentally demonstrate its performance in both deterministic and stochastic environments. Especially in 49 games of Atari 2600 domain, EBU achieves the same mean and median human normalized performance of DQN by using only 5% and 10% of samples, respectively.
Researcher Affiliation	Academia	Su Young Lee, Sungik Choi, Sae-Young Chung School of Electrical Engineering, KAIST, Republic of Korea {suyoung.l, si_choi, schung}@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 Episodic Backward Update for Tabular Q-Learning (single episode, tabular). Algorithm 2 Episodic Backward Update
Open Source Code	Yes	1The code is available at https://github.com/suyoung-lee/Episodic-Backward-Update
Open Datasets	Yes	We use the MNIST dataset [9] for the state representation. ... We use the same set of 49 Atari 2600 games, which was evaluated in Nature DQN paper [14].
Dataset Splits	No	The paper describes training procedures and evaluation metrics but does not explicitly provide specific train/validation/test dataset splits in terms of percentages or counts for reproducibility.
Hardware Specification	Yes	Training time refers to the total time required to train 49 games of 10M frames using a single NVIDIA TITAN Xp for a single random seed.
Software Dependencies	No	The paper mentions using deep neural networks and specific algorithms (DQN, Q-learning) but does not provide specific version numbers for software libraries or frameworks (e.g., TensorFlow, PyTorch).
Experiment Setup	Yes	The details of the hyperparameters and the network structure are described in Appendix D. ... We use a discount factor γ = 0.99, an Adam optimizer [9] with an initial learning rate of 0.00025, and an ϵ-greedy exploration with ϵ annealed from 1.0 to 0.1 over the ﬁrst 1M frames and ﬁxed to 0.1 thereafter. We use a replay memory size of 100,000 transitions, and train the network with a mini-batch size of 32.