Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Authors: Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on real-world instruction datasets using the LLa MA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness.
Researcher Affiliation Collaboration 1Department of Computer Science, National University of Singapore 2Noah s Ark Lab, Huawei. {zangwei, f-xue, yangluo, youy}@comp.nus.edu.sg; {renxiaozhe, jiang.xin}@huawei.com
Pseudocode No The paper includes diagrams illustrating the pipeline (Figure 1) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/zhengzangw/Sequence-Scheduling
Open Datasets Yes Our experiments are conducted on two datasets: a set of 10,000 prompts from a subset of the alpaca dataset [33] (which is different from the one used to train the length predictor) and a set of 429 prompts from the Instruction-in-Wild datasets [36].
Dataset Splits No The paper states that experiments are conducted on two datasets: 'a set of 10,000 prompts from a subset of the alpaca dataset [33]... and a set of 429 prompts from the Instruction-in-Wild datasets [36].' It does not specify train/validation/test splits for the overall experimental evaluation, implying the full datasets are used for evaluation.
Hardware Specification Yes The inference is performed on the Vicuna-7B [4] model using an 80GB A100 GPU. The training was conducted on a single 80GB A100 GPU.
Software Dependencies No The paper states 'All codes are implemented in Py Torch [26]' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes For our baseline experiments, we set the batch size to 16. Regarding the variable batch size strategy, we use a batch size of 16 for instructions with a length (L) greater than or equal to 300...We maintain a fixed group size of 256. We sample generations with a temperature of 0.5 for diversity in responses. Specifically, we set the learning rate to 0.00005 and trained the model for three epochs.