reproducibilityindex.ai

Can Graph Learning Improve Planning in LLM-based Agents?

Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. The performance gain increases with a larger task graph size.
Researcher Affiliation	Collaboration	1Fudan University 2Microsoft Research Asia 3The Chinese University of Hong Kong 4Peking University 5Washington University, Saint Louis
Pseudocode	No	The paper describes algorithms and methods in detail but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and datasets are available at https://github.com/WxxShirley/GNN4TaskPlan
Open Datasets	Yes	We utilize four datasets across two task planning benchmarks: Hugging Face tasks, Multimedia tasks, and Daily Life API tasks from Task Bench [45], as well as TMDB API tasks from Rest Bench [50].
Dataset Splits	No	For the datasets from Task Bench, we split 3000 samples for training and 500 samples for testing. While early stopping is mentioned, a specific validation dataset split or size is not explicitly provided.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A100-80G GPU. ... We utilize 2 NVIDIA A100-80G GPUs for fine-tuning the LLMs.
Software Dependencies	No	The paper mentions specific models like e5-335M [62] and Roberta-355M [40], and frameworks like Fast Chat, but it does not specify version numbers for any key software components or libraries required for reproduction.
Experiment Setup	Yes	During the model training, we set the batch size to 512 and run for 20 epochs with a learning rate of 1e-3. We use the Adam optimizer [25] and implement an early stopping mechanism with a patience of 5 epochs to prevent over-fitting. ... For open-sourced LLMs, the temperature parameter is set to 0.2.