Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. |
| Researcher Affiliation | Collaboration | 1Beijing Jiaotong University 2Alibaba Group |
| Pseudocode | No | The paper describes the system architecture and agent functionalities in detail but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is open-sourced at https://github.com/X-PLUG/Mobile Agent. |
| Open Datasets | No | The paper states: 'We select 5 system apps and 5 popular external apps for evaluation. For each app, we devise two basic instructions and two advanced instructions... In total, there were 88 instructions for non-English and English scenarios... The apps and instructions used for evaluation in non-English and English scenarios are presented in the appendix.' This refers to a custom-designed set of evaluation tasks, not a publicly available dataset with a link, DOI, or formal citation. |
| Dataset Splits | No | The paper describes a 'dynamic evaluation method' and a set of instructions used for evaluation but does not specify a division of these instructions into explicit training, validation, and test splits for model development or evaluation, as the MLLMs used are pre-trained via API calls. |
| Hardware Specification | No | For the MLLMs, the paper states: 'All calls are made through the official API method provided by the developers.', implying the use of cloud-based APIs without specifying the underlying hardware. No other specific hardware details (e.g., GPU models, CPU types, memory) for running experiments are provided. |
| Software Dependencies | No | The paper mentions specific MLLMs (GPT-4, GPT-4V, Gemini-1.5-Pro, Qwen-VL-Max) and tools (Conv Next Vi T-document, Grounding DINO, Qwen-VL-Int4), but it does not provide specific version numbers for general ancillary software components (e.g., Python, PyTorch, TensorFlow, specific libraries beyond the models themselves). |
| Experiment Setup | Yes | We fix the seed for GPT-4V invocation and set the temperature to 0 to avoid randomness. |