Learning to Navigate the Web
Authors: Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the ability of our agent to generalize to new instructions on World of Bits benchmark, on forms with up to 100 elements, supporting 14 million possible instructions. The QWeb agent outperforms the baseline without using any human demonstration achieving 100% success rate on several difficult environments. We test the performance of our approaches on a set of Miniwob and Miniwob++ tasks (Liu et al. (2018)). We show that both approaches improve upon a strong baseline and outperform previous state-of-the-art. |
| Researcher Affiliation | Collaboration | Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur Google AI izzeddingur@cs.ucsb.edu, {rueckert,faust}@google.com, dilek@ieee.org |
| Pseudocode | Yes | Algorithm 1 Curriculum-DQN. Algorithm 2 One-step DQN training. Algorithm 3 Meta-learning for training QWeb. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available. |
| Open Datasets | Yes | We evaluate our approaches on a number of environments from Miniwob (Shi et al. (2017)) and Miniwob++ (Liu et al. (2018)) benchmark tasks. |
| Dataset Splits | No | The paper refers to using benchmark tasks (Miniwob and Miniwob++) but does not specify exact dataset split percentages or counts for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions various software components and models (e.g., DQN, QWeb, bi LSTM) but does not provide specific version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | We cap the number of DOM elements at 100 and the number of fields is 3 for book-flight-form environment. All the environments return a sparse reward at the end of an episode with (+1) for successful and (-1) for failure episodes, respectively. We also use a small step penalty (-0.1) to encourage QWeb to find successful episodes using as small number of actions as possible. |