Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Authors: Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.
Researcher Affiliation Collaboration Tom Zahavy 1,2, Matan Haroush 1, Nadav Merlis 1, Daniel J. Mankowitz3, Shie Mannor1 1The Technion Israel Institute of Technology, 2 Google research, 3 Deepmind
Pseudocode Yes Algorithm 1 deep Q-learning with action elimination
Open Source Code Yes Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN
Open Datasets Yes Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN
Dataset Splits No The paper describes training and evaluation protocols (e.g., discounted factor for training vs. evaluation) but does not provide explicit training/validation/test dataset splits as commonly understood in supervised learning.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions using specific models like 'NLP CNN' and 'word2vec', but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We set the discounted factor during training to γ = 0.8 but use γ = 1 during evaluation 4. We used β = 0.5, ℓ= 0.6 in all the experiments. The results are averaged over 5 random seeds, shown alongside error bars (std/3).