reproducibilityindex.ai

Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

Authors: Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Arik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a comprehensive evaluation of Co A on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.
Researcher Affiliation	Collaboration	Penn State University, Google Cloud AI Research {yfz5488, rmz5227}@psu.edu, {ruoxis, yanfeichen, tpfister, soarik}@google.com
Pseudocode	Yes	Algorithm 1 Chain of Agents (Co A). and Algorithm 2 Chain of Agents (Co A) Input Chunking Algorithm.
Open Source Code	No	We will provide open access to the data and code upon acceptance.
Open Datasets	Yes	We conduct experiments on nine long context datasets across three task types (Table 3): Question Answering. We consider five QA datasets from the Long Bench [6] and SCROLL [60].
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets.
Hardware Specification	Yes	For RAG model, we use the model provided by Huggingface5 and run on A100 GPUs to rerank the chunks.
Software Dependencies	No	The paper mentions using 'Vertex model garden 4 API' and 'Huggingface5' models, but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	Maximum generation token is set to 2048 for gemini-ultra and set to 1024 for the rest of the models. We set temperature to 0 for all experiments except for Self-consistency setting.