reproducibilityindex.ai

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Authors: Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, Ying Shan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our method on various language models, which not only significantly improves the accuracy of invoking seen tools but also enables the zero-shot capacity for unseen tools. The code and demo have been available at https://github.com/AILab-CVC/GPT4Tools.
Researcher Affiliation	Collaboration	Rui Yang1 , Lin Song2 , Yanwei Li3, Sijie Zhao2, Yixiao Ge2, Xiu Li1, Ying Shan2 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Tencent AI Lab 3Chinese University of Hong Kong
Pseudocode	No	The paper describes the data generation process and instruction tuning steps in natural language and via a diagram (Figure 1), but does not include any pseudocode or formal algorithm blocks.
Open Source Code	Yes	The code and demo have been available at https://github.com/AILab-CVC/GPT4Tools.
Open Datasets	Yes	During generation, all image information utilized in GPT4Tools is sourced from the training set of COCO [43].
Dataset Splits	Yes	This evaluation dataset is partitioned into two components: the first part (validation set) has the same ingredients as the training set, encompassing 23 tools; the second part (test set) comprises 8 novel tools absent from the training set. ... The validation set contains the same tools as the training set, with approximately 50 items associated with each tool.
Hardware Specification	No	The paper mentions training on various language models and using LoRA optimization to make tuning feasible, but it does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used for their experiments.
Software Dependencies	No	Specifically, we equipped the projection layers of query, key, value, and output with Lo RA layers. The Lo RA attention dimension and scaling alpha were set to 16. While the language model was kept frozen, the Lo RA layers were optimized using the Adam W [46]. - The paper mentions techniques like LoRA and AdamW, and the LLMs used, but does not specify version numbers for any software dependencies, such as Python, PyTorch, or specific library versions.
Experiment Setup	Yes	All models were fine-tuned over 3 epochs, with a batch size 512. The learning rate was set to 3 10 4, and the maximum length of new tokens was restricted to 2048.