DOM-Q-NET: Grounded RL on Structured Language

Authors: Sheng Jia, Jamie Ryan Kiros, Jimmy Ba

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Furthermore, we show 2x improvements in sample efficiency when training in the multi-task setting, allowing our model to transfer learned behaviours across tasks. 4 EXPERIMENTS
Researcher Affiliation Collaboration Sheng Jia University of Toronto Vector Institute sheng.jia@utoronto.ca Jamie Kiros Google Brain kiros@google.com Jimmy Ba University of Toronto Vector Institute jba@cs.toronto.ca
Pseudocode Yes Algorithm 1 Multitask Learning with Shared Replay Buffer
Open Source Code Yes Reproducibility: Our code and demo are available at https://github.com/Sheng-J/DOM-Q-NET
Open Datasets Yes We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Shi et al. (2017) constructed benchmark tasks, Mini World of Bits (Mini Wo B), that consist of many toy tasks of web navigation.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits or their specific proportions.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running experiments.
Software Dependencies No The paper mentions several algorithms and frameworks like Dopamine, Adam, Rainbow, DDQN, Prioritized replay, Multi-step learning, and Noisy Net, but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes 6.1 HYPERPARAMETERS Table 1: Hyperparameters for training with Rainbow DQN (4 components) Hyperparameter Value Optimization algorithm Adam (Kingma & Ba, 2014) Learning rate 0.00015 Batch Size 128 Discounted factor 0.99 DQN Target network update period 200 online network updates Number of update per frame 1 Number of exploration steps 50 N steps (multi-step) bootstrap 8 Noisy Nets σ0 0.5 Use DDQN True... Table 2: Hyperparameters for DOM-Q-NET... Table 3: Hyperparameters for Replay Buffer