Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Authors: Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality. |
| Researcher Affiliation | Collaboration | Dujian Ding1 , Ankur Mallick2, Chi Wang2, Robert Sim2, Subhabrata Mukherjee3 , Victor Ruhle2, Laks V. S. Lakshmanan1, Ahmed Awadallah2 1 University of British Columbia 2 Microsoft 3 Hippocratic AI |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have made our source code available at https://github.com/m365-core/hybrid llm routing. |
| Open Datasets | Yes | We use the Mix Instruct dataset from (Jiang et al., 2023) to evaluate the effectiveness of different routing strategies. Mix Instruct consists of a wide range of tasks (e.g., question answering, summarization, information extraction) and enables us to train a generic router that will be effective across different scenarios. We present additional information about this dataset in Appendix B. [...] We uniformly sample 10k training examples from the training split of Mix Instruct, for each of which we generate 10 responses from all investigated LLMs. |
| Dataset Splits | Yes | Our validation and test splits are the same as the Mix Instruct dataset, which consist of 5k instruction examples separately. |
| Hardware Specification | Yes | All experiments are conducted with 1 NVIDIA A100 GPU of 80GB GPU RAM. |
| Software Dependencies | No | The paper mentions software components like 'De BERTa-v3-large', 'Lang Chain scoring evaluator', and 'GPT-4' but does not provide specific version numbers for ancillary software dependencies such as programming languages or libraries. |
| Experiment Setup | Yes | We train each router with the corresponding loss from Section 3 for 5 epochs and use the validation set to choose the best checkpoints for final evaluation. |