Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. |
| Researcher Affiliation | Collaboration | 1Beijing Jiaotong University 2Alibaba Group |
| Pseudocode | No | The paper describes the system architecture and agent functionalities in detail but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is open-sourced at https://github.com/X-PLUG/Mobile Agent. |
| Open Datasets | No | The paper states: 'We select 5 system apps and 5 popular external apps for evaluation. For each app, we devise two basic instructions and two advanced instructions... In total, there were 88 instructions for non-English and English scenarios... The apps and instructions used for evaluation in non-English and English scenarios are presented in the appendix.' This refers to a custom-designed set of evaluation tasks, not a publicly available dataset with a link, DOI, or formal citation. |
| Dataset Splits | No | The paper describes a 'dynamic evaluation method' and a set of instructions used for evaluation but does not specify a division of these instructions into explicit training, validation, and test splits for model development or evaluation, as the MLLMs used are pre-trained via API calls. |
| Hardware Specification | No | For the MLLMs, the paper states: 'All calls are made through the official API method provided by the developers.', implying the use of cloud-based APIs without specifying the underlying hardware. No other specific hardware details (e.g., GPU models, CPU types, memory) for running experiments are provided. |
| Software Dependencies | No | The paper mentions specific MLLMs (GPT-4, GPT-4V, Gemini-1.5-Pro, Qwen-VL-Max) and tools (Conv Next Vi T-document, Grounding DINO, Qwen-VL-Int4), but it does not provide specific version numbers for general ancillary software components (e.g., Python, PyTorch, TensorFlow, specific libraries beyond the models themselves). |
| Experiment Setup | Yes | We fix the seed for GPT-4V invocation and set the temperature to 0 to avoid randomness. |