reproducibilityindex.ai

Learning to Navigate the Web

Authors: Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difﬁcult environments. We test the performance of our approaches on a set of Miniwob and Miniwob++ tasks (Liu et al. (2018)). We show that both approaches improve upon a strong baseline and outperform previous state-of-the-art.
Researcher Affiliation	Collaboration	Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur Google AI izzeddingur@cs.ucsb.edu, {rueckert,faust}@google.com, dilek@ieee.org
Pseudocode	Yes	Algorithm 1 Curriculum-DQN. Algorithm 2 One-step DQN training. Algorithm 3 Meta-learning for training QWeb.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available.
Open Datasets	Yes	We evaluate our approaches on a number of environments from Miniwob (Shi et al. (2017)) and Miniwob++ (Liu et al. (2018)) benchmark tasks.
Dataset Splits	No	The paper refers to using benchmark tasks (Miniwob and Miniwob++) but does not specify exact dataset split percentages or counts for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions various software components and models (e.g., DQN, QWeb, bi LSTM) but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	We cap the number of DOM elements at 100 and the number of ﬁelds is 3 for book-ﬂight-form environment. All the environments return a sparse reward at the end of an episode with (+1) for successful and (-1) for failure episodes, respectively. We also use a small step penalty (-0.1) to encourage QWeb to ﬁnd successful episodes using as small number of actions as possible.