The Description Length of Deep Learning models

Authors: Léonard Blier, Yann Ollivier

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. In this work we explicitly measure how much current deep models actually compress data.
Researcher Affiliation Collaboration Léonard Blier École Normale Supérieure Paris, France leonard.blier@normalesup.org Yann Ollivier Facebook Artificial Intelligence Research Paris, France yol@fb.com
Pseudocode Yes Algorithm 2 in Appendix D
Open Source Code Yes C. Tallec and L. Blier. Pyvarinf : Variational Inference for Py Torch, 2018. URL https://github. com/ctallec/pyvarinf. (L. Blier is an author of the paper).
Open Datasets Yes Our running example will be image classification on the MNIST (Le Cun et al., 1998) and CIFAR10 (Krizhevsky, 2009) datasets.
Dataset Splits Yes Table 1: Compression bounds via Deep Learning. Compression bounds given by different codes on two datasets, MNIST and CIFAR10. ... The test accuracy of a model is the accuracy of its predictions on the test set. On MNIST, this provides a codelength of the labels (knowing the inputs) of 24.1 kbits... The corresponding model achieved 95.5% accuracy on the test set.
Hardware Specification No The paper does not specify any details about the hardware used for running the experiments (e.g., specific GPU or CPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions 'Pyvarinf : Variational Inference for Py Torch' and refers to PyTorch implicitly, but it does not specify concrete version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup Yes Neural networks that give the best variational compression bounds appear to be smaller than networks trained the usual way. We tested various fully connected networks and convolutional networks (Appendix C): the models that gave the best variational compression bounds were small Le Net-like networks. On CIFAR, we tested a simple multilayer perceptron, a shallow network, a small convolutional network, and a VGG convolutional network (Simonyan and Zisserman, 2014) first without data augmentation or batch normalization (VGGa) (Ioffe and Szegedy, 2015), then with both of them (VGGb) (Appendix D).