reproducibilityindex.ai

A Block Minifloat Representation for Training Deep Neural Networks

Authors: Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, david boland, Philip Leong

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks, with results compared against well-calibrated INT8, FP8 and FP32 baselines. On other tasks, BM8 is compared with an FP32 baseline.
Researcher Affiliation	Academia	Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, David Boland & Philip Leong School of Electrical and Information Engineering The University of Sydney Sydney, NSW 2006, AUS {first}.{last}@sydney.edu.au
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/sfox14/block_miniﬂoat
Open Datasets	Yes	We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks
Dataset Splits	Yes	The Image Net dataset has 1000 class labels, and consists of 256x256 images split into a training set with 1.28 million images and validation set with 50,000 images.
Hardware Specification	No	The paper mentions GPUs and PyTorch, but does not specify exact GPU models, CPU models, or other detailed hardware specifications used for running experiments. It mentions 'one GPU' for ImageNet training but no specific model.
Software Dependencies	No	The paper mentions "Py Torch" and "CUDA libraries" and "QPyTorch library", but does not specify version numbers for these software components.
Experiment Setup	Yes	We ran CIFAR experiments using SGD with momentum of 0.9 for 200 epochs in batches of 128 images and initial learning rate of 0.1 which is decayed by a factor of 5 at the 60th, 120th and 160th epochs. We use Res Net-18 (He et al., 2016) and Alex Net (Krizhevsky et al., 2012) architectures from the ofﬁcial Py Torch implementation, and train on one GPU with standard settings; SGD with momentum of 0.9, batches of 256 images, and an initial learning rate of 0.1 (0.01 for Alex Net) which is decayed by a factor of 10 at epoch 30 and 60.