World of Bits: An Open-Domain Platform for Web-Based Agents

Authors: Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, Percy Liang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we show that agents trained via behavioral cloning and reinforcement learning can complete a range of web-based tasks. [...] 4. Experiments Our goal in this section is to establish baselines that current techniques provide on web environments, and highlight the challenges for future work in this area.
Researcher Affiliation Collaboration 1Stanford University, Stanford, USA 2Open AI, San Francisco, USA. Correspondence to: Tianlin (Tim) Shi <tianlin@cs.stanford.edu>.
Pseudocode No The paper describes methods like behavior cloning and reinforcement learning (A3C) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: "To interact with a web browser, we developed our platform on top of Open AI Universe (http://universe.openai.com/)", which refers to a third-party platform used, not the authors' own source code for their methodology. There is no explicit statement or link providing access to their own source code.
Open Datasets No The paper describes the creation of datasets such as Mini Wo B, Form Wo B, and QAWo B, and details their characteristics and collection methods (e.g., "Our crowdsourced QAWo B dataset has 521 query templates."). However, it does not provide any specific link, DOI, repository name, or formal citation for public access to these datasets.
Dataset Splits No The paper states: "We split the tasks on each website into 80% for training, and 20% for testing." It does not explicitly mention a separate validation set split.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions various software components and libraries such as "Open AI Universe", "Gym", "Chrome browser inside a Docker container", and optimization algorithms like "Adam" and "A3C". However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We obtain a behavior cloning policy by training on the demonstrations using Adam (Kingma & Ba, 2014) with a learning rate of 10 3 and batch size of 32. We achieved better results by weighing click and keyboard event losses (which are rare compared to move events) 10 times higher in the objective. [...] We run 12 environments in parallel at 12 FPS for up to 1 million steps and perform an update every 200 time steps (i.e. training batches have size 12 200 = 2400 steps) with Adam and a learning rate of 10 4. [...] We use similar supervised learning setting as in Mini Wo B, except the learning rate is 10 4 and the keyboard event losses are weighted 20 times higher. For every episode, we sample randomly from the set of queries and run the model at 8 FPS.