reproducibilityindex.ai

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

Authors: Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality.
Researcher Affiliation	Collaboration	Dujian Ding1 , Ankur Mallick2, Chi Wang2, Robert Sim2, Subhabrata Mukherjee3 , Victor Ruhle2, Laks V. S. Lakshmanan1, Ahmed Awadallah2 1 University of British Columbia 2 Microsoft 3 Hippocratic AI
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have made our source code available at https://github.com/m365-core/hybrid llm routing.
Open Datasets	Yes	We use the Mix Instruct dataset from (Jiang et al., 2023) to evaluate the effectiveness of different routing strategies. Mix Instruct consists of a wide range of tasks (e.g., question answering, summarization, information extraction) and enables us to train a generic router that will be effective across different scenarios. We present additional information about this dataset in Appendix B. [...] We uniformly sample 10k training examples from the training split of Mix Instruct, for each of which we generate 10 responses from all investigated LLMs.
Dataset Splits	Yes	Our validation and test splits are the same as the Mix Instruct dataset, which consist of 5k instruction examples separately.
Hardware Specification	Yes	All experiments are conducted with 1 NVIDIA A100 GPU of 80GB GPU RAM.
Software Dependencies	No	The paper mentions software components like 'De BERTa-v3-large', 'Lang Chain scoring evaluator', and 'GPT-4' but does not provide specific version numbers for ancillary software dependencies such as programming languages or libraries.
Experiment Setup	Yes	We train each router with the corresponding loss from Section 3 for 5 epochs and use the validation set to choose the best checkpoints for final evaluation.