ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
Authors: Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. |
| Researcher Affiliation | Academia | Shibo Hao1, Tianyang Liu1, Zhen Wang1, 2, Zhiting Hu1 1UC San Diego, 2Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code is available at https://github.com/Ber666/Toolken GPT |
| Open Datasets | Yes | To evaluate the tool-learning proficiency in numerical reasoning comprehensively, we curate two new test datasets: (1) GSM8K-XL, an enhanced version of the existing GSM8K [10] dataset. |
| Dataset Splits | Yes | We get 6,054 examples, of which 1,000 were allocated for validation, and 5,054 for the training data. |
| Hardware Specification | Yes | In terms of computational resources, we train and test Toolken GPT based on LLa MA-13B and LLa MA-33B using 2 and 4 Nvidia RTX 3090 GPUs, respectively. |
| Software Dependencies | No | The paper mentions using specific models like LLa MA-13B, Chat GPT (gpt-3.5-turbo), and Sentence RoBERTa-large, but does not provide specific version numbers for these or other software dependencies like deep learning frameworks or Python packages. |
| Experiment Setup | Yes | The embeddings were trained with a learning rate of 5e-4, performing early stopping based on the development set, with a maximum of 10 epochs. |