Agent Planning with World Knowledge Model
Authors: Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. |
| Researcher Affiliation | Collaboration | Zhejiang University National University of Singapore, NUS-NCS Joint Lab Alibaba Group Zhejiang Key Laboratory of Big Data Intelligent Computing |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | 3The code is available at https://github.com/zjunlp/WKM. |
| Open Datasets | Yes | We evaluate our method on three real-world simulated planning datasets: ALFWorld [41], Web Shop [53], and Science World [50]. |
| Dataset Splits | Yes | Table 5: Dataset statistics. Dataset Train Text-Seen Text-Unseen ALFWorld 3,119 140 134 Web Shop 1,824 200 Science World 1,483 194 211 |
| Hardware Specification | Yes | All the training and inference experiments are conducted on 8 NVIDIA V100 32G GPUs within 12 hours. |
| Software Dependencies | No | We fine-tune the proposed approach with Lo RA [12] using the Llama Factory [62] framework. |
| Experiment Setup | Yes | Table 6: Detailed hyperparameters used in our paper. lora r 8 lora alpha 16 lora dropout 0.05 lora target modules q_proj, v_proj cutoff len 2048 epochs 3 batch size 32 batch size per device 4 gradient accumulation steps 2 learning rate 1e-4 warmup ratio 0.03 temperature 0.0, 0.5 retrieved state knowledge N 3000 Pagent(Au) weight γ 0.4, 0.5, 0.7 |