Autonomous Agents for Collaborative Task under Information Asymmetry
Authors: Wei Liu, Chenxi Wang, YiFei Wang, Zihao Xie, Rennai Qiu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Chen Qian
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that i Agents can collaborate within a social network of 140 individuals and 588 relationships, autonomously communicate over 30 turns, and retrieve information from nearly 70,000 messages to complete tasks within 3 minutes. |
| Researcher Affiliation | Academia | Tsinghua University Peng Cheng Laboratory, China |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | 1Available on https://github.com/thinkwee/iAgents. |
| Open Datasets | Yes | In this paper, we construct Informative Bench, the first benchmark to evaluate agent collaboration tasks featuring information asymmetry in social networks. It includes two categories with a total of five datasets. Details, including the scale, distribution, and metrics of the datasets, are provided in section C. What s more, recent studies have found that LLM continuously ingests internet data so static benchmarks can be easily memorized and overfitted [64, 55, 61]. Hence, two pipelines for constructing Informative Bench are easy to realize and can be generalized to more domains for constant and dynamic evaluations. They are Needle-Oriented and Reasoning-Oriented pipelines, as shown in Figure 4. [...] The SPC dataset [19] is a dialogue dataset based on LLM. [...] The Friends TV dataset reconstructs the social network from the entire Season 1 script of Friends [49], involving 140 characters with 588 relationships, and combines two questions in the Friends QA dataset [57, 24] as |
| Dataset Splits | No | The paper constructs a benchmark for evaluation but does not explicitly state dataset splits (e.g., percentages or counts) for training, validation, or testing for its own experiments on this benchmark. The LLM agents themselves are pre-trained, not trained on this specific benchmark. |
| Hardware Specification | No | The paper mentions the use of LLM backends (e.g., gpt-4-0125-preview, gemini-1.0-pro-latest) and discusses token consumption, but it does not provide specific details about the hardware (CPU, GPU, memory, etc.) on which these experiments were run or the LLMs were accessed/hosted. |
| Software Dependencies | Yes | The experiments use gpt-4-0125-preview, gpt-3.5-turbo-16k, gemini-1.0-pro-latest, and claude-sonnet 2 as LLM backends. [...] For Fuzzy Memory, we use gpt-4-0125-preview to summarize session text and Open AI text-embedding-3-small to generate embeddings for ANN embedding search. |
| Experiment Setup | Yes | We conduct all experiments with a maximum of 10 communication turns for agents. The experiments use gpt-4-0125-preview, gpt-3.5-turbo-16k, gemini-1.0-pro-latest, and claude-sonnet 2 as LLM backends. The temperature is set to 0.2. |