reproducibilityindex.ai

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Authors: Haiyan Yin, Sinno Pan

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use Atari 2600 games as testing environment to demonstrate the efﬁciency and effectiveness of our proposed solution for policy distillation.
Researcher Affiliation	Academia	Haiyan Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University, Singapore {haiyanyin, sinnopan}@ntu.edu.sg
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We use Atari 2600 games as testing environment to demonstrate the efﬁciency and effectiveness of our proposed solution for policy distillation.
Dataset Splits	No	The paper describes how experiences are generated and used for training, and how evaluation is performed periodically, but it does not specify traditional training/validation/test dataset splits with percentages or fixed counts from a static dataset, which is common in supervised learning.
Hardware Specification	No	The paper vaguely mentions "With modern GPUs" but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper mentions the use of the RMSProp algorithm for optimization, and refers to DQN, but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The network architecture used to train the single-task teacher DQN is identical to (Mnih et al. 2015). For student network, we used the proposed architecture as shown in Figure 1, where the convolutional layers from teacher networks are used to generate task-speciﬁc input features with a dimension size of 3,136. Moreover, the student network has two fully connected layers, with each consisting of 1,028 and 512 neurons respectively, and an output layer of 18 units. Each output corresponds to one control action in Atari games. The value for ϵ linearly decays from 1 to 0.1 within ﬁrst 1 million steps. The student performs one mini-batch update by sampling experience from each teacher’s replay memory at every 4 steps of playing. When using hierarchical prioritized experience replay, the number of partitions for each replay memory is set to be 5. Each partition can store up to 200,000 experiences. When using uniform sampling, the replay memory capacity is set to be 500,000. To avoid the agent from memorizing the steps, a random number of null operations (up to 30) are generated at the start of each episode.