A Block Minifloat Representation for Training Deep Neural Networks

Authors: Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, david boland, Philip Leong

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks, with results compared against well-calibrated INT8, FP8 and FP32 baselines. On other tasks, BM8 is compared with an FP32 baseline.
Researcher Affiliation Academia Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, David Boland & Philip Leong School of Electrical and Information Engineering The University of Sydney Sydney, NSW 2006, AUS {first}.{last}@sydney.edu.au
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available at https://github.com/sfox14/block_minifloat
Open Datasets Yes We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks
Dataset Splits Yes The Image Net dataset has 1000 class labels, and consists of 256x256 images split into a training set with 1.28 million images and validation set with 50,000 images.
Hardware Specification No The paper mentions GPUs and PyTorch, but does not specify exact GPU models, CPU models, or other detailed hardware specifications used for running experiments. It mentions 'one GPU' for ImageNet training but no specific model.
Software Dependencies No The paper mentions "Py Torch" and "CUDA libraries" and "QPyTorch library", but does not specify version numbers for these software components.
Experiment Setup Yes We ran CIFAR experiments using SGD with momentum of 0.9 for 200 epochs in batches of 128 images and initial learning rate of 0.1 which is decayed by a factor of 5 at the 60th, 120th and 160th epochs. We use Res Net-18 (He et al., 2016) and Alex Net (Krizhevsky et al., 2012) architectures from the official Py Torch implementation, and train on one GPU with standard settings; SGD with momentum of 0.9, batches of 256 images, and an initial learning rate of 0.1 (0.01 for Alex Net) which is decayed by a factor of 10 at epoch 30 and 60.