Can Graph Learning Improve Planning in LLM-based Agents?
Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. The performance gain increases with a larger task graph size. |
| Researcher Affiliation | Collaboration | 1Fudan University 2Microsoft Research Asia 3The Chinese University of Hong Kong 4Peking University 5Washington University, Saint Louis |
| Pseudocode | No | The paper describes algorithms and methods in detail but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and datasets are available at https://github.com/WxxShirley/GNN4TaskPlan |
| Open Datasets | Yes | We utilize four datasets across two task planning benchmarks: Hugging Face tasks, Multimedia tasks, and Daily Life API tasks from Task Bench [45], as well as TMDB API tasks from Rest Bench [50]. |
| Dataset Splits | No | For the datasets from Task Bench, we split 3000 samples for training and 500 samples for testing. While early stopping is mentioned, a specific validation dataset split or size is not explicitly provided. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA A100-80G GPU. ... We utilize 2 NVIDIA A100-80G GPUs for fine-tuning the LLMs. |
| Software Dependencies | No | The paper mentions specific models like e5-335M [62] and Roberta-355M [40], and frameworks like Fast Chat, but it does not specify version numbers for any key software components or libraries required for reproduction. |
| Experiment Setup | Yes | During the model training, we set the batch size to 512 and run for 20 epochs with a learning rate of 1e-3. We use the Adam optimizer [25] and implement an early stopping mechanism with a patience of 5 epochs to prevent over-fitting. ... For open-sourced LLMs, the temperature parameter is set to 0.2. |