reproducibilityindex.ai

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Authors: Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, Josh Tenenbaum

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game Montezuma s Revenge .4 Experiments
Researcher Affiliation	Collaboration	Tejas D. Kulkarni Deep Mind, London tejasdkulkarni@gmail.com Karthik R. Narasimhan CSAIL, MIT karthikn@mit.edu Ardavan Saeedi CSAIL, MIT ardavans@mit.edu Joshua B. Tenenbaum BCS, MIT jbt@mit.edu
Pseudocode	Yes	Algorithm 1 Learning algorithm for h-DQN, Algorithm 2 : EPSGREEDY(x, B, ϵ, Q), Algorithm 3 : UPDATEPARAMS(L, D)
Open Source Code	No	The paper includes a footnote 2Sample trajectory of a run on Montezuma s Revenge https://goo.gl/3Z64Ji which links to a video, but does not provide any explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	We use the Arcade Learning Environment [3] to perform experiments. [3] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artiﬁcial Intelligence Research, 2012.
Dataset Splits	No	The paper specifies the sizes for experience replay memories (D1 and D2 were set to be equal to 10^6 and 5 10^4 respectively) but does not provide explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using Deep Q-Learning framework and convolutional neural networks but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	all ϵ parameters are annealed from 1 to 0.1 over 50k steps. The learning rate is set to 2.5 10^4. The experience replay memories D1 and D2 were set to be equal to 10^6 and 5 10^4 respectively. We set the learning rate to be 2.5 10^4, with a discount rate of 0.99.