Online Cascade Learning for Efficient Inference over Streams
Authors: Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across four benchmarks show that our method parallels LLMs in accuracy while cutting down inference costs by as much as 90% with strong robustness against input distribution shifts, underscoring its efficacy and adaptability in stream processing. |
| Researcher Affiliation | Academia | Lunyiu Nie 1 Zhimin Ding 2 Erdong Hu 2 Christopher Jermaine 2 Swarat Chaudhuri 1 1The University of Texas at Austin 2Rice University. |
| Pseudocode | Yes | Algorithm 1 Online Cascade Learning. |
| Open Source Code | Yes | Our source code is available at https://github.com/ flitternie/online_cascade_learning. |
| Open Datasets | Yes | IMDB. A binary sentiment classification benchmark with 50,000 movie reviews (Maas et al., 2011). Hate Speech. A binary classification dataset consisting of posts from an online forum, annotated with hate and no Hate labels (de Gibert et al., 2018). ISEAR. A multi-class emotion detection benchmark encompassing 7,666 samples across seven categories (Joy, Fear, Anger, Sadness, Disgust, Shame, Guilt) (Shao et al., 2015). FEVER. A fact-checking dataset with 6,512 claims manually verified against Wikipedia, labeled as Supported or Refuted (Thorne et al., 2018). |
| Dataset Splits | Yes | To ensure fairness, datasets are split equally, with 50% prepared for training (as distillation labels) and the remaining 50% for testing. All methods are evaluated on the identical test sets. In our experiments, the distilled smaller models are used in isolation without any ensemble or cascade. ... In particular, we used the training set prepared for the knowledge distillation (as mentioned in Section 4) as our online cascade learning method s validation set. |
| Hardware Specification | Yes | The experiments involved querying Llama 2 70B Chat utilized a single machine equipped with 8 NVIDIA A40 GPUs, each with 48GB of memory, running CUDA 12.0. All the other experiments were conducted on a machine with 4 NVIDIA Quadro RTX 8000 GPUs (48GB memory each) on CUDA 12.2. |
| Software Dependencies | Yes | Using the 65B parameter LLa MA model (Touvron et al., 2023a) and Py Torch 2.1.2, on an Amazon Web Services m6in.16xlarge machine with eight, A100 GPUs... The experiments involved querying Llama 2 70B Chat utilized a single machine equipped with 8 NVIDIA A40 GPUs, each with 48GB of memory, running CUDA 12.0. All the other experiments were conducted on a machine with 4 NVIDIA Quadro RTX 8000 GPUs (48GB memory each) on CUDA 12.2. |
| Experiment Setup | Yes | The detailed hyperparameter settings for online cascade learning are listed in Table 3 and 4. We tuned the hyperparameters using a grid search method on a separate validation set, which is a standard practice to avoid overfitting. ... Specifically, for the hyperparameters β and µ, we observed that our experimental results are notably robust to variations in β. ... setting BERT-base s batch size to 8, the learning rate to 0.00001, and the number of epochs to 5. |