DOM-Q-NET: Grounded RL on Structured Language
Authors: Sheng Jia, Jamie Ryan Kiros, Jimmy Ba
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Furthermore, we show 2x improvements in sample efficiency when training in the multi-task setting, allowing our model to transfer learned behaviours across tasks. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Sheng Jia University of Toronto Vector Institute sheng.jia@utoronto.ca Jamie Kiros Google Brain kiros@google.com Jimmy Ba University of Toronto Vector Institute jba@cs.toronto.ca |
| Pseudocode | Yes | Algorithm 1 Multitask Learning with Shared Replay Buffer |
| Open Source Code | Yes | Reproducibility: Our code and demo are available at https://github.com/Sheng-J/DOM-Q-NET |
| Open Datasets | Yes | We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Shi et al. (2017) constructed benchmark tasks, Mini World of Bits (Mini Wo B), that consist of many toy tasks of web navigation. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits or their specific proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running experiments. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks like Dopamine, Adam, Rainbow, DDQN, Prioritized replay, Multi-step learning, and Noisy Net, but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | 6.1 HYPERPARAMETERS Table 1: Hyperparameters for training with Rainbow DQN (4 components) Hyperparameter Value Optimization algorithm Adam (Kingma & Ba, 2014) Learning rate 0.00015 Batch Size 128 Discounted factor 0.99 DQN Target network update period 200 online network updates Number of update per frame 1 Number of exploration steps 50 N steps (multi-step) bootstrap 8 Noisy Nets σ0 0.5 Use DDQN True... Table 2: Hyperparameters for DOM-Q-NET... Table 3: Hyperparameters for Replay Buffer |