Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Authors: Yaolun Zhang, Xiaogeng Liu, Chaowei Xiao
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our framework, we conduct experiments on both text-based tasks and practical tasks. The results indicate that the generated multi-agent system surpasses other auto-designed methods and can achieve a comparable performance with the human-designed multi-agent system, which is optimized for those specific tasks. The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/. |
| Researcher Affiliation | Academia | 1University of Wisconsin Madison, Madison, US. Correspondence to: Chaowei Xiao <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 FSM State Optimization Algorithm 2 Deployment Stage |
| Open Source Code | Yes | The code can be found at: https://github.com/Sa Fo Lab WISC/Meta Agent/. |
| Open Datasets | Yes | Firstly, we compare Meta Agent with other prompt-based methods on Trivial Creative Writing (Wang et al., 2024d) and GPQA (Rein et al., 2023). Machine Learning Bench(ml bench) (Hong et al., 2024a) is a benchmark that requires agents to train a machine-learning model for regression or classification. |
| Dataset Splits | Yes | # Load the dataset train_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_train.csv eval_data_path = /Users/a11/Desktop/Meta Agent/Meta Agent/ml_benchmark/04_titanic/split_eval.csv |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used to run its experiments, such as GPU or CPU models. It only mentions the foundation model used (GPT-4o). |
| Software Dependencies | No | The paper lists several software libraries used in the code example (pandas, sklearn, etc.) but does not provide specific version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We selected GPT-4o as the foundation model in the main experiments and set the temperature to 0 to ensure reproducibility. model = RandomForestClassifier(n_estimators=100, random_state=0) model = Pipeline(steps=[ (preprocessor, preprocessor), (classifier, RandomForestClassifier(random_state=42)) ]) |