DeepReDuce: ReLU Reduction for Fast Private Inference
Authors: Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to the state-of-the-art for private inference Deep Re Duce improves accuracy and reduces Re LU count by up to 3.5% (iso-Re LU count) and 3.5 (iso-accuracy), respectively. We perform our experiments on the CIFAR-100 (Krizhevsky et al., 2012) and Tiny Image Net (Le & Yang, 2015; Yao & Miller, 2015) datasets. |
| Researcher Affiliation | Academia | Nandan Kumar Jha 1 Zahra Ghodsi 1 Siddharth Garg 1 Brandon Reagen 1 1New York University, New York, USA. Correspondence to: Nandan Kumar Jha <nj2049@nyu.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release for the methodology described. |
| Open Datasets | Yes | We perform our experiments on the CIFAR-100 (Krizhevsky et al., 2012) and Tiny Image Net (Le & Yang, 2015; Yao & Miller, 2015) datasets. |
| Dataset Splits | No | The paper mentions "training and test images" for CIFAR-100 and "training and 50 test/validation images" for Tiny Image Net, but it does not specify exact split percentages or counts for training, validation, and test sets, nor does it reference predefined splits with citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper describes the training process and mentions using Knowledge Distillation, but it does not specify any software libraries, frameworks, or solvers with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Networks are trained using the following parameters: an initial learning rate of 0.1, mini-batch size of 128, the momentum of 0.9 (fixed), and 0.0004 weight decay factor. We train networks for 120 epochs on both CIFAR-100 and Tiny Image Net datasets. The learning rate is reduced by a factor of 10 every 30th epoch. For training on CIFAR-10, we use cosine learning and train the networks for 150 epochs. When using knowledge distillation, we set the hyperparameters, temperature and relative weight to cross-entropy loss on hard targets as 4 and 0.9, respectively. |