Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Authors: Longtao Zheng, Rundong Wang, Xinrun Wang, Bo An
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SYNAPSE on Mini Wo B++, a standard task suite, and Mind2Web, a real-world website benchmark. In Mini Wo B++, SYNAPSE achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, SYNAPSE is the first ICL method to solve the book-flight task in Mini Wo B++. SYNAPSE also exhibits a 56% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University, Singapore 2Skywork AI, Singapore |
| Pseudocode | No | The paper describes the system's components and process flow through natural language and examples of prompts, but it does not include a formally structured pseudocode or algorithm block. |
| Open Source Code | Yes | To ensure reproducibility, all resources such as code, prompts, and agent trajectories have been made publicly available at https://ltzheng.github.io/Synapse. |
| Open Datasets | Yes | We evaluate SYNAPSE on two benchmarks: Mini Wo B++ (Shi et al., 2017; Liu et al., 2018), a standard research task suite, and Mind2Web (Deng et al., 2023), a dataset across diverse domains of real-world web navigation. |
| Dataset Splits | No | The paper mentions a 'training set' and 'test sets' for the Mind2Web dataset but does not explicitly state a 'validation set' or detailed splits including a validation portion for reproducibility. |
| Hardware Specification | No | The paper mentions using specific LLMs like GPT-3.5 (via API) and Code Llama-7B, but it does not provide specific hardware details such as GPU models, CPU types, or cloud computing specifications used for running their experiments. |
| Software Dependencies | Yes | In the Mini Wo B++ experiments, we query gpt-3.5-turbo-0301... For Mind2Web, the default LLM is gpt-3.5-turbo-16k-0613. We use text-embedding-ada-002 as the embedding model. |
| Experiment Setup | Yes | We configure the temperature to 0, i.e., greedy decoding. ... Specifically, we set k to 3 and 5 for the previous and current observations, respectively. ... We retrieve the top three exemplars from memory and use the most common one to retrieve its exemplars... |