A Deep Hierarchical Approach to Lifelong Learning in Minecraft
Authors: Chen Tessler, Shahar Givony, Tom Zahavy, Daniel Mankowitz, Shie Mannor1553
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results for learning an H-DRLN in sub-domains of Minecraft with a DSN array and a distilled skill network. We also verify the improved convergence guarantees for utilizing reusable DSNs (a.k.a options) within the H-DRLN, compared to the vanilla DQN. |
| Researcher Affiliation | Academia | Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor Technion Israel Institute of Technology, Haifa, Israel chen.tessler, shahargiv, tomzahavy {@campus.technion.ac.il }, danielm@tx.technion.ac.il, shie@ee.technion.ac.il |
| Pseudocode | No | The paper describes its methods but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that code is released or available. |
| Open Datasets | No | The paper describes custom sub-domains created within Minecraft (e.g., "Navigation 1 domain", "two-room domain", "complex Minecraft domain") but does not provide concrete access information (link, DOI, repository name, formal citation with authors/year) for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes evaluation during training ("Evaluation the agent is evaluated during training using the current learned architecture every 20k (5k) optimization steps (a single epoch)"), but it does not provide specific dataset split information for distinct training, validation, and test sets with exact percentages, sample counts, or citations to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms like 'Vanilla DQN' and 'DDQN' but does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | The best parameter settings that we found include: (1) a higher learning ratio (iterations between emulator states, n-replay = 16), (2) higher learning rate (learning rate = 0.0025) and (3) less exploration (eps endt 400K). We also found that a smaller experience replay (replay memory 100K) provided improved performance... |