Can Graph Learning Improve Planning in LLM-based Agents?

Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. The performance gain increases with a larger task graph size.
Researcher Affiliation Collaboration 1Fudan University 2Microsoft Research Asia 3The Chinese University of Hong Kong 4Peking University 5Washington University, Saint Louis
Pseudocode No The paper describes algorithms and methods in detail but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and datasets are available at https://github.com/WxxShirley/GNN4TaskPlan
Open Datasets Yes We utilize four datasets across two task planning benchmarks: Hugging Face tasks, Multimedia tasks, and Daily Life API tasks from Task Bench [45], as well as TMDB API tasks from Rest Bench [50].
Dataset Splits No For the datasets from Task Bench, we split 3000 samples for training and 500 samples for testing. While early stopping is mentioned, a specific validation dataset split or size is not explicitly provided.
Hardware Specification Yes All experiments are conducted on a single NVIDIA A100-80G GPU. ... We utilize 2 NVIDIA A100-80G GPUs for fine-tuning the LLMs.
Software Dependencies No The paper mentions specific models like e5-335M [62] and Roberta-355M [40], and frameworks like Fast Chat, but it does not specify version numbers for any key software components or libraries required for reproduction.
Experiment Setup Yes During the model training, we set the batch size to 512 and run for 20 epochs with a learning rate of 1e-3. We use the Adam optimizer [25] and implement an early stopping mechanism with a patience of 5 epochs to prevent over-fitting. ... For open-sourced LLMs, the temperature parameter is set to 0.2.