Efficient Edge Inference by Selective Query
Authors: Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the Image Net dataset, our proposed method deployed on a micro-controller unit exhibits 25% reduction in latency compared to cloud-only processing while suffering no excess loss. Our experiments include (a) MCU and GPU (see Sec. 3.1), (b) Mobile Devices and GPUs (see Sec. 3.1), (c) on the same device (see Sec. 3.2, Appendix A.11.1). We run extensive experiments on benchmark datasets to show that the hybrid design reduces inference latency as well as energy consumption per inference. In this section, first, we train hybrid models for resource-constrained MCU devices, thus demonstrating the effectiveness of hybrid training. Next, we show that hybrid models can be adapted to resource-rich edge devices such as mobile phones. Next, we probe various aspects of our framework through ablations, including (A) validation on other datasets, (B) sensitivity of the solution to small communication latencies, and (C) effectiveness as an abstaining classifier for situations when a global model may be unavailable. Finally, we discuss a simple joint architecture search method for finding hybrid architectures with better performance than off-the-shelf architectures (see Appendix A.3, A.4 for details). Experimental Setup. We focus on the large-scale Image Net dataset(Russakovsky et al., 2015), consisting of 1.28M train and 50K validation images. |
| Researcher Affiliation | Collaboration | Anil Kag Boston University anilkag@bu.edu Igor Fedorov Meta AI ifedorov@meta.com Aditya Gangrade Carnegie Mellon University agangra2@andrew.cmu.edu Paul Whatmough Qualcomm AI Research pwhatmou@qti.qualcomm.com Venkatesh Saligrama Boston University srv@bu.edu |
| Pseudocode | Yes | Algorithm 1 Training Hybrid Models; Algorithm 2 Evolutionary Joint Architecture Search; Algorithm 3 End-to-end Hybrid Procedure; Algorithm 4 Tuning Routing Model. |
| Open Source Code | Yes | Our code is available at https://github.com/anilkagak2/Hybrid_Models |
| Open Datasets | Yes | Experimental Setup. We focus on the large-scale Image Net dataset(Russakovsky et al., 2015), consisting of 1.28M train and 50K validation images. CIFAR-100 Krizhevsky (2009). This is a 100 way image classification dataset consisting of images with 32 32 3 pixels. It has 50K train and 10K test images. IMDb Maas et al. (2011). This is a sentiment classification dataset with two classes (positive and negative). It consists of raw text of movie reviews. There are 25K train and 25K test data points. |
| Dataset Splits | Yes | Experimental Setup. We focus on the large-scale Image Net dataset(Russakovsky et al., 2015), consisting of 1.28M train and 50K validation images. CIFAR-100 Krizhevsky (2009)... It has 50K train and 10K test images. IMDb Maas et al. (2011)... There are 25K train and 25K test data points. |
| Hardware Specification | Yes | On the Image Net dataset, our proposed method deployed on a micro-controller unit exhibits 25% reduction in latency. Table 1: Device & Model Characteristics: Edge (STM32F746 MCU), Cloud (V100 GPU). We use a V100 GPU as the cloud device. It has a 16GB VRAM and the server associated with this GPU has 1TB storage. We use STM32F746 3, an ARM Cortex-M7 MCU as the edge device. It has 320KB SRAM and 1MB Flash storage. |
| Software Dependencies | No | The paper mentions software like "TFLite model", "Tensor Flow Lite for Microcontrollers (TFLM)", and "Hugging Face2 library", but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Experimental Setup. We focus on the large-scale Image Net dataset(Russakovsky et al., 2015), consisting of 1.28M train and 50K validation images. We follow standard data augmentation (mirroring, resize and crop) for training and single crop for testing. We borrow the pre-trained baselines from their public implementations (see Sec. A.6.2). Appendix A.6.1 lists our hyperparameters settings. A.6.1 HYPER-PARAMETER SETTINGS. We use SGD with momentum as the default optimizer in all our experiments. We initialize our hybrid models from the corresponding pre-trained models and use a learning rate of 1e 4 for learning base and global models. We use a learning rate of 1e 2 for learning the routing network. We decay the learning rate using a cosine learning rate scheduler. As recommended in the earlier works, we use a weight decay of 1e 5. We set the number of epochs to be 50. We use a batch size of 256 in our experiments. |