Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

Authors: Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Arik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a comprehensive evaluation of Co A on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.
Researcher Affiliation Collaboration Penn State University, Google Cloud AI Research {yfz5488, rmz5227}@psu.edu, {ruoxis, yanfeichen, tpfister, soarik}@google.com
Pseudocode Yes Algorithm 1 Chain of Agents (Co A). and Algorithm 2 Chain of Agents (Co A) Input Chunking Algorithm.
Open Source Code No We will provide open access to the data and code upon acceptance.
Open Datasets Yes We conduct experiments on nine long context datasets across three task types (Table 3): Question Answering. We consider five QA datasets from the Long Bench [6] and SCROLL [60].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification Yes For RAG model, we use the model provided by Huggingface5 and run on A100 GPUs to rerank the chunks.
Software Dependencies No The paper mentions using 'Vertex model garden 4 API' and 'Huggingface5' models, but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Maximum generation token is set to 2048 for gemini-ultra and set to 1024 for the rest of the models. We set temperature to 0 for all experiments except for Self-consistency setting.