reproducibilityindex.ai

Ultra-Low Precision 4-bit Training of Deep Neural Networks

Authors: Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the robustness of the proposed 4-bit training scheme, we examined the impact of using INT4 weights and activations and FP4 gradients on a spectrum of computer vision models on the CIFAR10 [38] and Image Net [39] datasets, as summarized in Tables 1 and 2 respectively. These emulation results were performed using a custom-modiﬁed Py Torch framework that implemented all of the precisions and schemes discussed in the paper (details in the Appendix-A).
Researcher Affiliation	Industry	IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
Pseudocode	Yes	Figure 3: Grad Scale a per-layer trainable scaling factor: (a) Grad Scale Update Algorithm
Open Source Code	No	The paper mentions using a 'custom-modiﬁed Py Torch framework' but does not state that the code for their specific methodology is open-sourced or available.
Open Datasets	Yes	To demonstrate the robustness of the proposed 4-bit training scheme, we examined the impact of using INT4 weights and activations and FP4 gradients on a spectrum of computer vision models on the CIFAR10 [38] and Image Net [39] datasets, as summarized in Tables 1 and 2 respectively.
Dataset Splits	No	The paper mentions using 'default network architectures' and datasets like CIFAR10 and Image Net, but does not explicitly specify the exact training, validation, and test splits (e.g., percentages or counts) used for reproduction.
Hardware Specification	No	The paper discusses general hardware accelerators like GPUs and TPUs and references hardware design costs, but it does not specify the exact models or configurations of GPUs, CPUs, or other hardware used to run their experiments (e.g., 'NVIDIA A100', 'Tesla V100').
Software Dependencies	No	The paper mentions using a 'custom-modiﬁed Py Torch framework', but does not provide specific version numbers for PyTorch or any other software libraries or dependencies used in their experiments.
Experiment Setup	No	The paper states, 'For all of these models, we used default network architectures, pre-processing techniques, hyper-parameters and optimizers with 4-bit training.' However, it does not provide concrete numerical values for these hyperparameters (e.g., learning rate, batch size, number of epochs) or specific optimizer settings, which are crucial for replicating the experimental setup.