reproducibilityindex.ai

Neural gradients are near-lognormal: improved quantized and sparse training

Authors: Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Each method achieves state-of-the-art results on Image Net. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity in each case without accuracy degradation. Reference implementation accompanies the paper in the supplementary material.
Researcher Affiliation	Collaboration	Habana Labs An Intel company, Caesarea, Israel, Department of Electrical Engineering Technion, Haifa, Israel
Pseudocode	Yes	Pseudo-code appears in Algorithm 1
Open Source Code	Yes	Reference implementation accompanies the paper in the supplementary material.
Open Datasets	Yes	Each method achieves state-of-the-art results on Image Net. Res Net18, Res Net101 Cifar100. Res Net18, Squeeze Net Image Net. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248 255. Ieee, 2009.
Dataset Splits	Yes	The validation accuracy during training for different sparsity levels and different datasets can be found in Fig. A.16. In Table 3 we show the results of different allocations between exponent and mantissa for different FP formats in Cifar100 and Image Net dataset.
Hardware Specification	No	The paper mentions "HW accelerator" but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for the experiments.
Software Dependencies	No	The paper discusses different floating-point formats and related work, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper mentions "All results were achieved using the suggested gradient scaling, where the mean is sampled once every epoch" but lacks comprehensive details on the experimental setup such as specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.