Gorilla: Large Language Model Connected with Massive APIs
Authors: Shishir G Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the model s ability, we introduce APIBench, a comprehensive dataset consisting of Hugging Face, Torch Hub, and Tensor Hub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Using APIBench, we finetune Gorilla, a LLa MA-7B-based model with document retrieval and show that it significantly outperforms both open-source and closed-source models like Claude and GPT-4 in terms of API functionality accuracy as well as a reduction in API argument hallucination errors. |
| Researcher Affiliation | Collaboration | Shishir G. Patil1 Tianjun Zhang1 Xin Wang2 Joseph E. Gonzalez1 1UC Berkeley 2Microsoft Research |
| Pseudocode | No | The paper describes methodologies in text and figures but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Gorilla s code, model, data, and demo are available at: https://gorilla.cs.berkeley.edu |
| Open Datasets | Yes | To evaluate the model s ability, we introduce APIBench, a comprehensive dataset consisting of Hugging Face, Torch Hub, and Tensor Hub APIs. APIBench ( 1600 APIs) by scraping a large corpus of ML APIs. |
| Dataset Splits | No | The paper states data splits for training and testing (e.g., '90% training and 10% evaluation' for Hugging Face, and '80% training and 20% testing' for Torch Hub and Tensor Hub), but does not explicitly mention a separate 'validation' dataset split for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | We finetune it on 8x A100 with 40G memory each. |
| Software Dependencies | No | The paper mentions software components and libraries such as 'Hugging Face API', 'Torch Hub', 'Tensor Hub', 'PyTorch', 'TensorFlow', 'transformers', 'sentencepiece', 'accelerate', 'CUDA', 'cudnn', but it does not provide a comprehensive list with specific version numbers for all key dependencies required for reproduction in the paper's text. |
| Experiment Setup | Yes | We train Gorillafor 5 epochs with the 2e-5 learning rate with cosine decay. The details are provide in Table 6. Hyperparameter Name Value learning rate 2e-5 batch size 64 epochs 5 warmup ratio 0.03 weight decay 0 max seq length 2048 |