A Block Minifloat Representation for Training Deep Neural Networks
Authors: Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, david boland, Philip Leong
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks, with results compared against well-calibrated INT8, FP8 and FP32 baselines. On other tasks, BM8 is compared with an FP32 baseline. |
| Researcher Affiliation | Academia | Sean Fox, Seyedramin Rasoulinezhad, Julian Faraone, David Boland & Philip Leong School of Electrical and Information Engineering The University of Sydney Sydney, NSW 2006, AUS {first}.{last}@sydney.edu.au |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/sfox14/block_minifloat |
| Open Datasets | Yes | We evaluated the training accuracy of BM on a subset of image, language and object detection modelling tasks. The entire spectrum of representations were explored on Image Net (Deng et al., 2009) and CIFAR (Krizhevsky et al., 2009) image recognition benchmarks |
| Dataset Splits | Yes | The Image Net dataset has 1000 class labels, and consists of 256x256 images split into a training set with 1.28 million images and validation set with 50,000 images. |
| Hardware Specification | No | The paper mentions GPUs and PyTorch, but does not specify exact GPU models, CPU models, or other detailed hardware specifications used for running experiments. It mentions 'one GPU' for ImageNet training but no specific model. |
| Software Dependencies | No | The paper mentions "Py Torch" and "CUDA libraries" and "QPyTorch library", but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We ran CIFAR experiments using SGD with momentum of 0.9 for 200 epochs in batches of 128 images and initial learning rate of 0.1 which is decayed by a factor of 5 at the 60th, 120th and 160th epochs. We use Res Net-18 (He et al., 2016) and Alex Net (Krizhevsky et al., 2012) architectures from the official Py Torch implementation, and train on one GPU with standard settings; SGD with momentum of 0.9, batches of 256 images, and an initial learning rate of 0.1 (0.01 for Alex Net) which is decayed by a factor of 10 at epoch 30 and 60. |