Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Authors: Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on real-world instruction datasets using the LLa MA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, National University of Singapore 2Noah s Ark Lab, Huawei. {zangwei, f-xue, yangluo, youy}@comp.nus.edu.sg; {renxiaozhe, jiang.xin}@huawei.com |
| Pseudocode | No | The paper includes diagrams illustrating the pipeline (Figure 1) but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/zhengzangw/Sequence-Scheduling |
| Open Datasets | Yes | Our experiments are conducted on two datasets: a set of 10,000 prompts from a subset of the alpaca dataset [33] (which is different from the one used to train the length predictor) and a set of 429 prompts from the Instruction-in-Wild datasets [36]. |
| Dataset Splits | No | The paper states that experiments are conducted on two datasets: 'a set of 10,000 prompts from a subset of the alpaca dataset [33]... and a set of 429 prompts from the Instruction-in-Wild datasets [36].' It does not specify train/validation/test splits for the overall experimental evaluation, implying the full datasets are used for evaluation. |
| Hardware Specification | Yes | The inference is performed on the Vicuna-7B [4] model using an 80GB A100 GPU. The training was conducted on a single 80GB A100 GPU. |
| Software Dependencies | No | The paper states 'All codes are implemented in Py Torch [26]' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For our baseline experiments, we set the batch size to 16. Regarding the variable batch size strategy, we use a batch size of 16 for instructions with a length (L) greater than or equal to 300...We maintain a fixed group size of 256. We sample generations with a temperature of 0.5 for diversity in responses. Specifically, we set the learning rate to 0.00005 and trained the model for three epochs. |