Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Thought Communication in Multiagent Collaboration
Authors: Yujia Zheng, Zhuokai Zhao, Zijian Li, Yaqi Xie, Mingze Gao, Lizhu Zhang, Kun Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real-world benchmarks validate the theory and demonstrate the collaborative advantages of thought communication. |
| Researcher Affiliation | Collaboration | 1 CMU 2 Meta AI 3 MBZUAI EMAIL EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm block is provided in the paper. The methodology is described in narrative form and supported by diagrams such as Figure 2: Overview of THOUGHTCOMM. |
| Open Source Code | No | The paper does not explicitly state that its own implementation code is open-source or provide a link to a code repository for the methodology described. It only mentions utilizing code from a baseline for comparison: "For baseline comparisons, we utilize the original code released by the authors" [Subramaniam et al., 2025]. |
| Open Datasets | Yes | Therefore, we evaluate THOUGHTCOMM on two widely used math reasoning benchmarks, MATH [Hendrycks et al., 2021] and GSM8K [Cobbe et al., 2021] to assess its real-world effectiveness. |
| Dataset Splits | Yes | Following Subramaniam et al. [2025], we randomly sample 500 examples for fine-tuning the latent communication module, which includes both an autoencoder and an adapter, while reserving another 500 examples for evaluation. |
| Hardware Specification | Yes | All experiments are conducted on a single compute node with 8 NVIDIA H100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for key software components, libraries, or frameworks used for implementation (e.g., PyTorch, Python, CUDA versions). |
| Experiment Setup | No | The paper mentions setting the prefix token count for the method to 1 in Implementation Details, and discusses varying prefix lengths from 1 to 16 in Section 5.4. However, it does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings for training the autoencoder and adapter. |