Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent.
Researcher Affiliation Collaboration 1Beijing Jiaotong University 2Alibaba Group
Pseudocode No The paper describes the system architecture and agent functionalities in detail but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code is open-sourced at https://github.com/X-PLUG/Mobile Agent.
Open Datasets No The paper states: 'We select 5 system apps and 5 popular external apps for evaluation. For each app, we devise two basic instructions and two advanced instructions... In total, there were 88 instructions for non-English and English scenarios... The apps and instructions used for evaluation in non-English and English scenarios are presented in the appendix.' This refers to a custom-designed set of evaluation tasks, not a publicly available dataset with a link, DOI, or formal citation.
Dataset Splits No The paper describes a 'dynamic evaluation method' and a set of instructions used for evaluation but does not specify a division of these instructions into explicit training, validation, and test splits for model development or evaluation, as the MLLMs used are pre-trained via API calls.
Hardware Specification No For the MLLMs, the paper states: 'All calls are made through the official API method provided by the developers.', implying the use of cloud-based APIs without specifying the underlying hardware. No other specific hardware details (e.g., GPU models, CPU types, memory) for running experiments are provided.
Software Dependencies No The paper mentions specific MLLMs (GPT-4, GPT-4V, Gemini-1.5-Pro, Qwen-VL-Max) and tools (Conv Next Vi T-document, Grounding DINO, Qwen-VL-Int4), but it does not provide specific version numbers for general ancillary software components (e.g., Python, PyTorch, TensorFlow, specific libraries beyond the models themselves).
Experiment Setup Yes We fix the seed for GPT-4V invocation and set the temperature to 0 to avoid randomness.