Robust Model Compression Using Deep Hypotheses
Authors: Omri Armstrong, Ran Gilad-Bachrach6688-6695
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the success of this algorithm empirically by compressing neural networks and random forests into small decision trees, which are interpretable models, and show that they are more accurate and robust than other comparable methods. In addition, our empirical study shows that our method outperforms Knowledge Distillation on DNN to DNN compression. |
| Researcher Affiliation | Academia | Omri Armstrong, Ran Gilad-Bachrach Tel Aviv University Ramat Aviv 699780, Tel Aviv armstrong@mail.tau.ac.il, rgb@tauex.tau.ac.il |
| Pseudocode | Yes | Algorithm 1: Multiclass Empirical Median Optimization (MEMO) Algorithm (Section 3) and Algorithm 2: Compact Robust Estimated Median Belief Optimization (CREMBO) (Section 4). |
| Open Source Code | Yes | Our code is available at https://github.com/TAU-ML-well/Rubust-Model Compression. |
| Open Datasets | Yes | We evaluated the CREMBO algorithm on five classification tasks (Table 1) from the UCI repository (Dua and Graff 2017)... The models were trained on the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009). |
| Dataset Splits | Yes | To find the median tree, we split Strain into a train and validation sets, S train, Sval, with a random 15% split and run the CREMBO algorithm. (Section 5.1). Then we divided the training set to a train and validation set with a random 10% split. (Section 5.2). |
| Hardware Specification | No | The paper mentions software used (PyTorch, scikit-learn) and training parameters but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch (Paszke et al. 2017)' and 'scikit-learn (Pedregosa et al. 2011) package' but does not specify their version numbers. |
| Experiment Setup | Yes | The DNNs are all fully connected with two hidden layers of 128 units with Re Lu activation functions. They were trained with an ADAM optimizer with default parameters and batch size of 32 for 10 epochs. (Section 5.1). We used ADAM optimizer, batch size of 128, learning rate of 0.01 for 60 epochs and then learning rate of 0.001 for another 30 epochs. (Section 5.2). |