reproducibilityindex.ai

Gorilla: Large Language Model Connected with Massive APIs

Authors: Shishir G Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the model s ability, we introduce APIBench, a comprehensive dataset consisting of Hugging Face, Torch Hub, and Tensor Hub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Using APIBench, we finetune Gorilla, a LLa MA-7B-based model with document retrieval and show that it significantly outperforms both open-source and closed-source models like Claude and GPT-4 in terms of API functionality accuracy as well as a reduction in API argument hallucination errors.
Researcher Affiliation	Collaboration	Shishir G. Patil1 Tianjun Zhang1 Xin Wang2 Joseph E. Gonzalez1 1UC Berkeley 2Microsoft Research
Pseudocode	No	The paper describes methodologies in text and figures but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Gorilla s code, model, data, and demo are available at: https://gorilla.cs.berkeley.edu
Open Datasets	Yes	To evaluate the model s ability, we introduce APIBench, a comprehensive dataset consisting of Hugging Face, Torch Hub, and Tensor Hub APIs. APIBench ( 1600 APIs) by scraping a large corpus of ML APIs.
Dataset Splits	No	The paper states data splits for training and testing (e.g., '90% training and 10% evaluation' for Hugging Face, and '80% training and 20% testing' for Torch Hub and Tensor Hub), but does not explicitly mention a separate 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification	Yes	We finetune it on 8x A100 with 40G memory each.
Software Dependencies	No	The paper mentions software components and libraries such as 'Hugging Face API', 'Torch Hub', 'Tensor Hub', 'PyTorch', 'TensorFlow', 'transformers', 'sentencepiece', 'accelerate', 'CUDA', 'cudnn', but it does not provide a comprehensive list with specific version numbers for all key dependencies required for reproduction in the paper's text.
Experiment Setup	Yes	We train Gorillafor 5 epochs with the 2e-5 learning rate with cosine decay. The details are provide in Table 6. Hyperparameter Name Value learning rate 2e-5 batch size 64 epochs 5 warmup ratio 0.03 weight decay 0 max seq length 2048