Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
Authors: Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations demonstrate a considerable speedup and added robustness over vanilla DQN in text-based games with over a thousand discrete actions. |
| Researcher Affiliation | Collaboration | Tom Zahavy 1,2, Matan Haroush 1, Nadav Merlis 1, Daniel J. Mankowitz3, Shie Mannor1 1The Technion Israel Institute of Technology, 2 Google research, 3 Deepmind |
| Pseudocode | Yes | Algorithm 1 deep Q-learning with action elimination |
| Open Source Code | Yes | Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN |
| Open Datasets | Yes | Our code, the Zork domain, and the implementation of the elimination signal can be found at: https://github.com/TomZahavy/CB_AE_DQN |
| Dataset Splits | No | The paper describes training and evaluation protocols (e.g., discounted factor for training vs. evaluation) but does not provide explicit training/validation/test dataset splits as commonly understood in supervised learning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific models like 'NLP CNN' and 'word2vec', but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set the discounted factor during training to γ = 0.8 but use γ = 1 during evaluation 4. We used β = 0.5, ℓ= 0.6 in all the experiments. The results are averaged over 5 random seeds, shown alongside error bars (std/3). |