Information-Theoretic Understanding of Population Risk Improvement with Model Compression
Authors: Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal Veeravalli3300-3307
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide some real-world experiments to validate our theoretical assertions and the DRHW K-means algorithm.2 Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10. |
| Researcher Affiliation | Collaboration | Yuheng Bu,1 Weihao Gao,1 Shaofeng Zou,2 Venugopal V. Veeravalli1 1University of Illinois at Urbana-Champaign, Urbana, IL, USA 2University at Buffalo, The State University of New York, Buffalo, NY, USA ... Currently with Bytedance Inc., Bellevue, WA, USA |
| Pseudocode | No | No structured pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | All the codes of our experiments are available at the following link https://github.com/wgao9/weight-quant. |
| Open Datasets | Yes | Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10. |
| Dataset Splits | No | The paper states: 'We use 10% of the training data to train the model for MNIST, and use 20% of the training data to train the model for CIFAR10.' This specifies a fraction of the *training data* used for training, but does not provide explicit overall training/validation/test splits for the datasets. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | Our experiments include compression of: (i) a three-layer fully connected network on MNIST; and (ii) a convolutional neural network with five conv layers and three linear layers on CIFAR10. ... We use 10% of the training data to train the model for MNIST, and use 20% of the training data to train the model for CIFAR10. For each experiment, we use the same number of clusters for each convolutional layer and fully connected layer. and Diameter-regularized Hessian-weighted K-means with different β on the MNIST dataset with K = 7. |