Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
Authors: Marton Havasi, Robert Peharz, José Miguel Hernández-Lobato
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method sets new state-of-the-art in neural network compression, as it strictly dominates previous approaches in a Pareto sense: On the benchmarks Le Net-5/MNIST and VGG-16/CIFAR-10, our approach yields the best test performance for a fixed memory budget, and vice versa, it achieves the highest compression rates for a fixed test performance. |
| Researcher Affiliation | Collaboration | Marton Havasi Department of Engineering University of Cambridge mh740@cam.ac.uk Robert Peharz Department of Engineering University of Cambridge rp587@cam.ac.uk Jos e Miguel Hern andez-Lobato Department of Engineering University of Cambridge, Microsoft Research, Alan Turing Institute jmh233@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Minimal Random Coding; Algorithm 2 Minimal Random Code Learning (MIRACLE) |
| Open Source Code | Yes | The code is publicly available at https://github.com/cambridge-mlg/miracle |
| Open Datasets | Yes | The experiments were conducted on two common benchmarks: Le Net-5 on MNIST and VGG-16 on CIFAR-10. |
| Dataset Splits | No | The paper mentions 'optimizing the expected loss on the training set' and evaluating on the 'test set', and references 'validation' within Algorithm 2 (VARIATIONAL UPDATES(I)) for internal variational updates. However, it does not explicitly provide specific percentages, sample counts, or clear references to predefined splits for a dedicated 'validation' dataset split for its experiments, as distinct from training or testing. |
| Hardware Specification | Yes | 1 day on a single NVIDIA P100 for VGG |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | For training MIRACLE, we used Adam (Kingma & Ba, 2014) with the default learning rate (10-3) and we set ϵβ0 = 10-8 and ϵβ = 5 · 10-5. The local coding goal Cloc was fixed at 20 bits for Le Net-5 and it was varied between 15 and 5 bits for VGG (B was kept constant). For the number of intermediate variational updates I, we used I = 50 for Le Net-5 and I = 1 for VGG, in order to keep training time reasonable ( 1 day on a single NVIDIA P100 for VGG). |