Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Agent Planning with World Knowledge Model
Authors: Shuofei Qiao, Runnan Fang, Ningyu Zhang, Yuqi Zhu, Xiang Chen, Shumin Deng, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. |
| Researcher Affiliation | Collaboration | Zhejiang University National University of Singapore, NUS-NCS Joint Lab Alibaba Group Zhejiang Key Laboratory of Big Data Intelligent Computing |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | 3The code is available at https://github.com/zjunlp/WKM. |
| Open Datasets | Yes | We evaluate our method on three real-world simulated planning datasets: ALFWorld [41], Web Shop [53], and Science World [50]. |
| Dataset Splits | Yes | Table 5: Dataset statistics. Dataset Train Text-Seen Text-Unseen ALFWorld 3,119 140 134 Web Shop 1,824 200 Science World 1,483 194 211 |
| Hardware Specification | Yes | All the training and inference experiments are conducted on 8 NVIDIA V100 32G GPUs within 12 hours. |
| Software Dependencies | No | We fine-tune the proposed approach with Lo RA [12] using the Llama Factory [62] framework. |
| Experiment Setup | Yes | Table 6: Detailed hyperparameters used in our paper. lora r 8 lora alpha 16 lora dropout 0.05 lora target modules q_proj, v_proj cutoff len 2048 epochs 3 batch size 32 batch size per device 4 gradient accumulation steps 2 learning rate 1e-4 warmup ratio 0.03 temperature 0.0, 0.5 retrieved state knowledge N 3000 Pagent(Au) weight γ 0.4, 0.5, 0.7 |