Information-Theoretic Understanding of Population Risk Improvement with Model Compression

Authors: Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal Veeravalli3300-3307

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide some real-world experiments to validate our theoretical assertions and the DRHW K-means algorithm.2 Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10.
Researcher Affiliation Collaboration Yuheng Bu,1 Weihao Gao,1 Shaofeng Zou,2 Venugopal V. Veeravalli1 1University of Illinois at Urbana-Champaign, Urbana, IL, USA 2University at Buffalo, The State University of New York, Buffalo, NY, USA ... Currently with Bytedance Inc., Bellevue, WA, USA
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes All the codes of our experiments are available at the following link https://github.com/wgao9/weight-quant.
Open Datasets Yes Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10.
Dataset Splits No The paper states: 'We use 10% of the training data to train the model for MNIST, and use 20% of the training data to train the model for CIFAR10.' This specifies a fraction of the *training data* used for training, but does not provide explicit overall training/validation/test splits for the datasets.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for experiments are provided in the paper.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10. ... We use 10% of the training data to train the model for MNIST, and use 20% of the training data to train the model for CIFAR10. For each experiment, we use the same number of clusters for each convolutional layer and fully connected layer. and Diameter-regularized Hessian-weighted K-means with different β on the MNIST dataset with K = 7.