Robust Model Compression Using Deep Hypotheses

Authors: Omri Armstrong, Ran Gilad-Bachrach6688-6695

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the success of this algorithm empirically by compressing neural networks and random forests into small decision trees, which are interpretable models, and show that they are more accurate and robust than other comparable methods. In addition, our empirical study shows that our method outperforms Knowledge Distillation on DNN to DNN compression.
Researcher Affiliation Academia Omri Armstrong, Ran Gilad-Bachrach Tel Aviv University Ramat Aviv 699780, Tel Aviv armstrong@mail.tau.ac.il, rgb@tauex.tau.ac.il
Pseudocode Yes Algorithm 1: Multiclass Empirical Median Optimization (MEMO) Algorithm (Section 3) and Algorithm 2: Compact Robust Estimated Median Belief Optimization (CREMBO) (Section 4).
Open Source Code Yes Our code is available at https://github.com/TAU-ML-well/Rubust-Model Compression.
Open Datasets Yes We evaluated the CREMBO algorithm on five classification tasks (Table 1) from the UCI repository (Dua and Graff 2017)... The models were trained on the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009).
Dataset Splits Yes To find the median tree, we split Strain into a train and validation sets, S train, Sval, with a random 15% split and run the CREMBO algorithm. (Section 5.1). Then we divided the training set to a train and validation set with a random 10% split. (Section 5.2).
Hardware Specification No The paper mentions software used (PyTorch, scikit-learn) and training parameters but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions using 'Py Torch (Paszke et al. 2017)' and 'scikit-learn (Pedregosa et al. 2011) package' but does not specify their version numbers.
Experiment Setup Yes The DNNs are all fully connected with two hidden layers of 128 units with Re Lu activation functions. They were trained with an ADAM optimizer with default parameters and batch size of 32 for 10 epochs. (Section 5.1). We used ADAM optimizer, batch size of 128, learning rate of 0.01 for 60 epochs and then learning rate of 0.001 for another 30 epochs. (Section 5.2).