Sequential Learning of Neural Networks for Prequential MDL
Authors: Jorg Bornschein, Yazhe Li, Marcus Hutter
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we evaluate approaches for computing prequential description lengths for image classification datasets with neural networks. Considering the computational cost, we find that online-learning with rehearsal has favorable performance compared to the previously widely used block-wise estimation. |
| Researcher Affiliation | Industry | Jorg Bornschein bornschein@deepmind.com Yazhe Li yazhe@deepmind.com Marcus Hutter mhutter@deepmind.com |
| Pseudocode | Yes | Algorithm 1 Mini-batch Incremental Training with Replay Streams |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code. |
| Open Datasets | Yes | We use MNIST (Le Cun et al., 2010), EMNIST, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) and randomly shuffle each into a fixed sequence of examples. |
| Dataset Splits | Yes | at each stage we split the data D<sk into a 90% training and a 10% calibration data. Conceptually, we could perform post-calibration by first training the network to convergence and then, with all parameters frozen, replacing the output layer softmax(h) with the calibrated output layer softmax(softplus(β)h), where β is a scalar parameter chosen to minimize the loss on calibration data. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for experiments, such as GPU or CPU models. It mentions training on 'a GPU' or in a 'data center' implicitly but lacks detail. |
| Software Dependencies | No | The paper mentions software like 'Adam W' and 'RMSProp' as optimizers, and 'randaugment' for data augmentation, but it does not specify any version numbers for these software dependencies (e.g., PyTorch 1.9 or Python 3.8). |
| Experiment Setup | Yes | The hyperparameter intervals depend on the data and are detailed in Appendix B. We sample learning rate, EMA step size, batch size, weight decay; but crucially also number of epochs (or, correspondingly, number of replay streams for MI/RS) and an overall scaling of the model width (number of channels). |