Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MAT-Agent: Adaptive Multi-Agent Training Optimization

Authors: jusheng zhang, Kaitong Cai, Yijia Fan, Ningyuan Liu, Keze Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across Pascal VOC, COCO, and VG-256 demonstrate MAT-Agent s superiority: it achieves an m AP of 97.4 (vs. 96.2 for PAT-T), OF1 of 92.3, and CF1 of 91.4 on Pascal VOC; an m AP of 92.8 (vs. 92.0 for HSQ-Cv N), OF1 of 88.2, and CF1 of 87.1 on COCO; and an m AP of 60.9, OF1 of 70.8, and CF1 of 61.1 on VG-256.
Researcher Affiliation	Academia	Jusheng Zhang1, Kaitong Cai1, Yijia Fan1, Ning yuan Liu1, Keze Wang1, 1Sun Yat-sen University Corresponding author: EMAIL
Pseudocode	No	The paper describes the methodology in Section 3 using mathematical formulations and descriptive text, accompanied by Figure 1 showing the framework. There is no explicit section or block labeled "Pseudocode" or "Algorithm" containing structured steps for a method.
Open Source Code	No	The code is not publicly available via a link in the current submission to preserve anonymity, but detailed implementation descriptions are included in the Supplementary Material.
Open Datasets	Yes	To comprehensively evaluate the performance of MAT-Agent in multi-label image classification, we conduct extensive comparative experiments on three representative datasets: Pascal VOC[63], MS-COCO[64], and Visual Genome (VG-256)[65].
Dataset Splits	Yes	To investigate the efficient knowledge transfer capability of the MAT Agent model in scenarios with limited data specifically the transfer efficiency on small datasets we conduct systematic investigations based on three datasets: VOC, NUS WIDE, and Open Images. Specifically, we first set a target m AP for each dataset: 80 for the VOC dataset, and 60 for both NUS WIDE and Open Images. Subsequently, the model is pretrained on the MS-COCO dataset and then fine-tuned using the target dataset. We record the number of epochs required for the model to reach the target m AP. To further demonstrate the performance of MAT Agent comprehensively, we perform systematic comparisons between MAT Agent and mainstream methods (i.e., PBT, BOHB, and DARTS).
Hardware Specification	Yes	All experiments use a Res Net-101 backbone, batch size 64, for 50 epochs, averaged over three runs on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper mentions various components and optimizers like Adam W, but it does not specify version numbers for any software libraries, programming languages, or frameworks used in the implementation (e.g., PyTorch version, Python version).
Experiment Setup	Yes	All experiments use a Res Net-101 backbone, batch size 64, for 50 epochs, averaged over three runs on a single NVIDIA A100 GPU. We use Adam W, ε-greedy decay from 1.0 0.1, a replay buffer of 50,000, target network updates every 1,000 steps, intrinsic-reward weight λi = 0.1, and extrinsic-reward weight λe = 1.0.