Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration
Authors: Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x. |
| Researcher Affiliation | Academia | Department of Computer Science, Department of Statistics Stanford University, Stanford, CA 94305, USA |
| Pseudocode | No | The paper describes the workflow-guided exploration (WGE) framework in numbered steps and various models, but it does not present formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our code and data are available at https://github. com/stanfordnlp/wge. |
| Open Datasets | Yes | The public Mini Wo B benchmark1 contains 80 tasks. We filtered for the 40 tasks that only require actions in our action space... For each task, we used Amazon Mechanical Turk to collect 10 demonstrations, which record all mouse and keyboard events along with the state of the DOM when each event occurred. Our code and data are available at https://github. com/stanfordnlp/wge. |
| Dataset Splits | Yes | For DOMNET+BC+RL and DOMNET+WGE, we report the test success rate at the time step where the success rate on a validation set reaches its maximum. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions 'Selenium web driver interface' but does not specify any version numbers for Selenium or other software dependencies. |
| Experiment Setup | Yes | Each task contains a 160px 210px environment and a goal specified in text. ... For each task, we used Amazon Mechanical Turk to collect 10 demonstrations... During behavioral cloning, we apply early stopping based on the reward on a validation set. |