Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Authors: Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, Josh Tenenbaum
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the strength of our approach on two problems with very sparse and delayed feedback: (1) a complex discrete stochastic decision process with stochastic transitions, and (2) the classic ATARI game Montezuma s Revenge .4 Experiments |
| Researcher Affiliation | Collaboration | Tejas D. Kulkarni Deep Mind, London tejasdkulkarni@gmail.com Karthik R. Narasimhan CSAIL, MIT karthikn@mit.edu Ardavan Saeedi CSAIL, MIT ardavans@mit.edu Joshua B. Tenenbaum BCS, MIT jbt@mit.edu |
| Pseudocode | Yes | Algorithm 1 Learning algorithm for h-DQN, Algorithm 2 : EPSGREEDY(x, B, ϵ, Q), Algorithm 3 : UPDATEPARAMS(L, D) |
| Open Source Code | No | The paper includes a footnote 2Sample trajectory of a run on Montezuma s Revenge https://goo.gl/3Z64Ji which links to a video, but does not provide any explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | We use the Arcade Learning Environment [3] to perform experiments. [3] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2012. |
| Dataset Splits | No | The paper specifies the sizes for experience replay memories (D1 and D2 were set to be equal to 10^6 and 5 10^4 respectively) but does not provide explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Deep Q-Learning framework and convolutional neural networks but does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | all ϵ parameters are annealed from 1 to 0.1 over 50k steps. The learning rate is set to 2.5 10^4. The experience replay memories D1 and D2 were set to be equal to 10^6 and 5 10^4 respectively. We set the learning rate to be 2.5 10^4, with a discount rate of 0.99. |