A data-driven approach for learning to control computers

Authors: Peter C Humphreys, David Raposo, Tobias Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Adam Santoro, Timothy Lillicrap

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve state-of-the-art and humanlevel mean performance across all tasks within the Mini Wob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer.We collected over 2.4 million demonstrations of the 104 Mini Wob++ tasks from a total of 77 human participants, which amounts to approximately 6300 hours.
Researcher Affiliation Industry Peter Humphreys 1 David Raposo 1 Toby Pohlen 1 Gregory Thornton 1 Rachita Chhaparia 1 Alistair Muldal 1 Josh Abramson 1 Petko Georgiev 1 Adam Santoro 1 Timothy Lillicrap 1 1Deep Mind, London, United Kingdom.
Pseudocode No The paper presents a detailed architectural diagram (Figure 2) for the CC-Net agent but does not include any formal pseudocode or algorithm blocks describing the steps of its methods.
Open Source Code No The paper references third-party tools like Sandbox2 (Git Hub Repository) and Chrome Driver Security Considerations with URLs, but it does not provide any explicit statement or link to the authors' own open-source code for their described methodology.
Open Datasets Yes A useful benchmark for initial investigations of computer control is the Mini Wob++ task suite (Shi et al., 2017; Liu et al., 2018), which comprises a set of instruction following tasks that require clicking, typing, form-filling, and other such basic computer interactions (Fig. 1b).
Dataset Splits No Before being used for BC, our human demonstration data was split into train and test sets (2.2 million & 310 thousand episodes respectively).
Hardware Specification No The paper mentions ensuring 'sufficient resources on the server running the environment' and that the environment is implemented in C++ for low latency, but it does not provide specific details about the CPU, GPU, or any other hardware used for experiments.
Software Dependencies No The paper mentions using Google Chrome and refers to algorithms like Adam and VMPO, but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or browser versions) used in the experiments.
Experiment Setup Yes Table 1. Hyper-parameters used in training. See Song et al. (2019) for descriptions of the VMPO hyper-parameters. PARAMETER VALUE Optimizer Adam (Kingma and Ba, 2014) Learning rate 1e-4 Adam b1 parameter 0.9 Adam b2 parameter 0.999 Weight decay (biases excluded) 1e-1 VMPO loss weight 1.0 BC loss weight (baseline) 1.0 VMPO εα 0.1 VMPO εη 0.2 Agent discount γ 0.9 Batch size 256 Trajectory unroll length 64 Target-network update period T 50 Maximum number of steps per episode 300