reproducibilityindex.ai

DOM-Q-NET: Grounded RL on Structured Language

Authors: Sheng Jia, Jamie Ryan Kiros, Jimmy Ba

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Furthermore, we show 2x improvements in sample efﬁciency when training in the multi-task setting, allowing our model to transfer learned behaviours across tasks. 4 EXPERIMENTS
Researcher Affiliation	Collaboration	Sheng Jia University of Toronto Vector Institute sheng.jia@utoronto.ca Jamie Kiros Google Brain kiros@google.com Jimmy Ba University of Toronto Vector Institute jba@cs.toronto.ca
Pseudocode	Yes	Algorithm 1 Multitask Learning with Shared Replay Buffer
Open Source Code	Yes	Reproducibility: Our code and demo are available at https://github.com/Sheng-J/DOM-Q-NET
Open Datasets	Yes	We demonstrate the capabilities of our model on the Mini Wo B environment where we can match or outperform existing work without the use of expert demonstrations. Shi et al. (2017) constructed benchmark tasks, Mini World of Bits (Mini Wo B), that consist of many toy tasks of web navigation.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits or their specific proportions.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running experiments.
Software Dependencies	No	The paper mentions several algorithms and frameworks like Dopamine, Adam, Rainbow, DDQN, Prioritized replay, Multi-step learning, and Noisy Net, but does not provide specific version numbers for software dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	6.1 HYPERPARAMETERS Table 1: Hyperparameters for training with Rainbow DQN (4 components) Hyperparameter Value Optimization algorithm Adam (Kingma & Ba, 2014) Learning rate 0.00015 Batch Size 128 Discounted factor 0.99 DQN Target network update period 200 online network updates Number of update per frame 1 Number of exploration steps 50 N steps (multi-step) bootstrap 8 Noisy Nets σ0 0.5 Use DDQN True... Table 2: Hyperparameters for DOM-Q-NET... Table 3: Hyperparameters for Replay Buffer