Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
Authors: Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Arik
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a comprehensive evaluation of Co A on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs. |
| Researcher Affiliation | Collaboration | Penn State University, Google Cloud AI Research {yfz5488, rmz5227}@psu.edu, {ruoxis, yanfeichen, tpfister, soarik}@google.com |
| Pseudocode | Yes | Algorithm 1 Chain of Agents (Co A). and Algorithm 2 Chain of Agents (Co A) Input Chunking Algorithm. |
| Open Source Code | No | We will provide open access to the data and code upon acceptance. |
| Open Datasets | Yes | We conduct experiments on nine long context datasets across three task types (Table 3): Question Answering. We consider five QA datasets from the Long Bench [6] and SCROLL [60]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. |
| Hardware Specification | Yes | For RAG model, we use the model provided by Huggingface5 and run on A100 GPUs to rerank the chunks. |
| Software Dependencies | No | The paper mentions using 'Vertex model garden 4 API' and 'Huggingface5' models, but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Maximum generation token is set to 2048 for gemini-ultra and set to 1024 for the rest of the models. We set temperature to 0 for all experiments except for Self-consistency setting. |