Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Authors: Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality.
Researcher Affiliation Collaboration Dujian Ding1 , Ankur Mallick2, Chi Wang2, Robert Sim2, Subhabrata Mukherjee3 , Victor Ruhle2, Laks V. S. Lakshmanan1, Ahmed Awadallah2 1 University of British Columbia 2 Microsoft 3 Hippocratic AI
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We have made our source code available at https://github.com/m365-core/hybrid llm routing.
Open Datasets Yes We use the Mix Instruct dataset from (Jiang et al., 2023) to evaluate the effectiveness of different routing strategies. Mix Instruct consists of a wide range of tasks (e.g., question answering, summarization, information extraction) and enables us to train a generic router that will be effective across different scenarios. We present additional information about this dataset in Appendix B. [...] We uniformly sample 10k training examples from the training split of Mix Instruct, for each of which we generate 10 responses from all investigated LLMs.
Dataset Splits Yes Our validation and test splits are the same as the Mix Instruct dataset, which consist of 5k instruction examples separately.
Hardware Specification Yes All experiments are conducted with 1 NVIDIA A100 GPU of 80GB GPU RAM.
Software Dependencies No The paper mentions software components like 'De BERTa-v3-large', 'Lang Chain scoring evaluator', and 'GPT-4' but does not provide specific version numbers for ancillary software dependencies such as programming languages or libraries.
Experiment Setup Yes We train each router with the corresponding loss from Section 3 for 5 epochs and use the validation set to choose the best checkpoints for final evaluation.