reproducibilityindex.ai

AutoSurvey: Large Language Models Can Automatically Write Surveys

Authors: Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating Auto Survey s effectiveness. We conduct comprehensive experiments to evaluate the performance of Auto Survey, comparing it against traditional methods for generating survey papers.
Researcher Affiliation	Collaboration	Yidong Wang1,2 , Qi Guo2,3 , Wenjin Yao2, Hongbo Zhang1, Xin Zhang4, Zhen Wu3,Meishan Zhang4, Xinyu Dai3, Min Zhang4, Qingsong Wen5, Wei Ye2 , Shikun Zhang2 , Yue Zhang1 1Westlake University, 2Peking University, 3Nanjing University, 4Harbin Institute of Technology, Shenzhen, 5Squirrel AI
Pseudocode	Yes	The pseudo code of Auto Survey can be found at Algorithm 1. Algorithm 1 AUTOSURVEY: Automated Survey Creation Using LLMs.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes]
Open Datasets	Yes	For Auto Survey, we utilize a corpus of 530,000 computer science papers from ar Xiv as the retrieval database. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes]
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing of a model developed in the paper. It evaluates outputs generated by LLMs against human-written surveys.
Hardware Specification	No	The paper mentions the use of specific LLM APIs (Claude-3-Haiku, GPT-4, Gemini-1.5-Pro3) but does not provide details on the underlying hardware specifications (e.g., GPU models, CPU types, memory) used to run these models or the Auto Survey framework itself.
Software Dependencies	Yes	We adopt nomic-embed-text-v1.5 [49], a widely used embedding model in RAG applications.
Experiment Setup	Yes	For the drafting phase of Auto Survey, we utilize Claude-3-Haiku... For evaluations, we employ a combination of GPT-4, Claude-3-Haiku, and Gemini-1.5-Pro3. For Auto Survey, we utilize a corpus of 530,000 computer science papers from ar Xiv as the retrieval database. During the initial drafting stage, we retrieve 1200 papers relevant to the given topic and split them into several chunks with a window size of 30,000 tokens. The outline predetermines the number of sections as 8. For subsection drafting, the models generate specific sections using the outline and 60 papers retrieved based on the subsection descriptions, focusing on the main body of each paper (up to the first 1,500 tokens). The iteration number N is set to 2. When calling API, we set temperature = 1 and other parameters as default.