reproducibilityindex.ai

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Authors: Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions.
Researcher Affiliation	Collaboration	Tom Zahavy 1,2, Matan Haroush 1, Nadav Merlis 1, Daniel J. Mankowitz3, Shie Mannor1 1The Technion Israel Institute of Technology, 2 Google research, 3 Deepmind
Pseudocode	Yes	Algorithm 1 deep Q-learning with action elimination
Open Source Code	Yes	Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN
Open Datasets	Yes	Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN
Dataset Splits	No	The paper describes training and evaluation protocols (e.g., discounted factor for training vs. evaluation) but does not provide explicit training/validation/test dataset splits as commonly understood in supervised learning.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions using specific models like 'NLP CNN' and 'word2vec', but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We set the discounted factor during training to γ = 0.8 but use γ = 1 during evaluation 4. We used β = 0.5, ℓ= 0.6 in all the experiments. The results are averaged over 5 random seeds, shown alongside error bars (std/3).