Selective Network Linearization for Efficient Private Inference
Authors: Minsu Cho, Ameya Joshi, Brandon Reagen, Siddharth Garg, Chinmay Hegde
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to 4.25% more accuracy (iso-Re LU count at 50K) or 2.2 less latency (iso-accuracy at 70%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. |
| Researcher Affiliation | Academia | Minsu Cho 1 Ameya Joshi 1 Siddharth Garg 1 Brandon Reagen 1 Chinmay Hegde 1 1New York University Tandon School of Engineering, New York. |
| Pseudocode | Yes | Algorithm 1 SNL: Selective Network Linearization |
| Open Source Code | Yes | Public code is available at https://github.com/NYU-DICE-Lab/selective_network_linearization. |
| Open Datasets | Yes | We focus on standard image classification datasets (CIFAR10/100 and Tiny Image Net). |
| Dataset Splits | Yes | CIFAR10 has 10 output classes with 5000 training images and 1000 test images per class, while CIFAR-100 has 100 output classes with 500 training images and 100 test images per class. Tiny Image Net has 200 output classes with 500 training images and 50 validation images. |
| Hardware Specification | No | The paper mentions wall-clock time measurements but does not specify the hardware (e.g., GPU/CPU models) used to obtain them. |
| Software Dependencies | No | The paper mentions optimizers like SGD and ADAM, and the DELPHI protocol, but does not provide specific software dependencies with version numbers (e.g., PyTorch version, Python version, library versions). |
| Experiment Setup | Yes | We first pre-train networks on CIFAR-10/100 using SGD with initial learning rate 0.1 and momentum 0.9, decay the learning rate at 80 and 120 epochs with 0.1 learning rate decay factor, 0.0005 weight decay, and use batch-size 256. For Tiny-Image Net, we use the same hyperparameters, except that we train for 100 epochs and perform learning rate decay at 50 and 75 epochs with decay factor 0.1. |