reproducibilityindex.ai

Sequential Learning of Neural Networks for Prequential MDL

Authors: Jorg Bornschein, Yazhe Li, Marcus Hutter

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we evaluate approaches for computing prequential description lengths for image classification datasets with neural networks. Considering the computational cost, we find that online-learning with rehearsal has favorable performance compared to the previously widely used block-wise estimation.
Researcher Affiliation	Industry	Jorg Bornschein bornschein@deepmind.com Yazhe Li yazhe@deepmind.com Marcus Hutter mhutter@deepmind.com
Pseudocode	Yes	Algorithm 1 Mini-batch Incremental Training with Replay Streams
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code.
Open Datasets	Yes	We use MNIST (Le Cun et al., 2010), EMNIST, CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) and randomly shuffle each into a fixed sequence of examples.
Dataset Splits	Yes	at each stage we split the data D<sk into a 90% training and a 10% calibration data. Conceptually, we could perform post-calibration by first training the network to convergence and then, with all parameters frozen, replacing the output layer softmax(h) with the calibrated output layer softmax(softplus(β)h), where β is a scalar parameter chosen to minimize the loss on calibration data.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for experiments, such as GPU or CPU models. It mentions training on 'a GPU' or in a 'data center' implicitly but lacks detail.
Software Dependencies	No	The paper mentions software like 'Adam W' and 'RMSProp' as optimizers, and 'randaugment' for data augmentation, but it does not specify any version numbers for these software dependencies (e.g., PyTorch 1.9 or Python 3.8).
Experiment Setup	Yes	The hyperparameter intervals depend on the data and are detailed in Appendix B. We sample learning rate, EMA step size, batch size, weight decay; but crucially also number of epochs (or, correspondingly, number of replay streams for MI/RS) and an overall scaling of the model width (number of channels).