Learning to Navigate the Web

Authors: Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difficult environments. We test the performance of our approaches on a set of Miniwob and Miniwob++ tasks (Liu et al. (2018)). We show that both approaches improve upon a strong baseline and outperform previous state-of-the-art.
Researcher Affiliation Collaboration Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur Google AI izzeddingur@cs.ucsb.edu, {rueckert,faust}@google.com, dilek@ieee.org
Pseudocode Yes Algorithm 1 Curriculum-DQN. Algorithm 2 One-step DQN training. Algorithm 3 Meta-learning for training QWeb.
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available.
Open Datasets Yes We evaluate our approaches on a number of environments from Miniwob (Shi et al. (2017)) and Miniwob++ (Liu et al. (2018)) benchmark tasks.
Dataset Splits No The paper refers to using benchmark tasks (Miniwob and Miniwob++) but does not specify exact dataset split percentages or counts for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions various software components and models (e.g., DQN, QWeb, bi LSTM) but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes We cap the number of DOM elements at 100 and the number of fields is 3 for book-flight-form environment. All the environments return a sparse reward at the end of an episode with (+1) for successful and (-1) for failure episodes, respectively. We also use a small step penalty (-0.1) to encourage QWeb to find successful episodes using as small number of actions as possible.