Private Model Compression via Knowledge Distillation
Authors: Ji Wang, Weidong Bao, Lichao Sun, Xiaomin Zhu, Bokai Cao, Philip S. Yu1190-1197
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. We evaluate the proposed RONA by using three standard benchmarks that are widely used in knowledge distillation works. The results demonstrate the effectiveness of the above novel methods, bringing significant improvement in training small models with rigorous privacy guarantee. |
| Researcher Affiliation | Collaboration | Ji Wang,1 Weidong Bao,1 Lichao Sun,2 Xiaomin Zhu,1,3 Bokai Cao,4 Philip S. Yu2,5 1College of Systems Engineering, National University of Defense Technology, Changsha, China 2Department of Computer Science, University of Illinois at Chicago, Chicago, USA 3State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, China 4Facebook Inc., Menlo Park, USA 5Institute for Data Science, Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1: Compact Student Model Training; Algorithm 2: Function Privacy Sanitize; Algorithm 3: Function Query Select |
| Open Source Code | Yes | Our code is open-sourced at https://github.com/jwanglearn/PrivateCompress. |
| Open Datasets | Yes | The framework RONA is evaluated based on three popular image datasets: MNIST (Le Cun et al. 1998), SVHN (Netzer et al. 2011), and CIFAR-10 (Krizhevsky and Hinton 2009). |
| Dataset Splits | No | The paper describes splitting training samples into 'public data' and 'sensitive data' (e.g., '80% training samples as the public data') but does not specify a separate 'validation' split for hyperparameter tuning or early stopping, nor does it clearly describe the full train/validation/test splits for reproducibility across all datasets. |
| Hardware Specification | No | The paper mentions hardware for mobile device deployment ('HUAWEI HONOR 8 equipped with ARM Cortex-A53@2.3GHz and Cortex A53@1.81GHz') but does not specify the hardware used for training the deep learning models in the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | Hint learning epochs. It can be observed from Fig. 2(a) that the accuracy of the student model increases when the hint learning epoch ascends. ... Iterations for distillation learning. The total epochs of distillation learning are determined by the rounds of iterations R and the epochs per iteration Td. ... Batch size. ... we set the batch size as 512 in our experiments. ... Noise scale. ... Even when the noise scale is relatively large (σ = 20)... |