reproducibilityindex.ai

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the impact of reconstruction granularity on quantization performance across various models using the Image Net dataset. Notably, with a 4/4 bit quantization on Dei T-tiny, we attain a Top1 accuracy of 66.31%. Furthermore, our approach achieves a Top-1 accuracy of 80.50% on Vi T-small, surpassing Noisy Quant by a margin of 3.64% (80.50% versus 76.86%).
Researcher Affiliation	Collaboration	1This work was done when Yuexiao Ma was intern at Byte Dance Inc. 2Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 3Byte Dance Inc. 4Peng Cheng Laboratory, Shenzhen, China. 5Institute of Artificial Intelligence, Xiamen University.
Pseudocode	Yes	Algorithm 1 Granularity and Optimization
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology or a link to a code repository.
Open Datasets	Yes	We empirically validate the impact of reconstruction granularity on quantization performance across various models using the Image Net dataset.
Dataset Splits	No	The paper mentions using "16 batch data for PTQ optimization" and "16 batches of 64 samples each from the training set for calibration" but does not specify a train/validation/test split with percentages or sample counts.
Hardware Specification	No	The paper states, "We conduct our experiments on NVIDIA Tesla", but does not specify the exact model (e.g., V100, A100), which is required for a specific hardware detail.
Software Dependencies	No	The paper mentions referring to settings of methods like Adaround, BRECQ, and QDrop, but it does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	For the hyper-parameter settings of the optimization parameters, such as reconstruction iteration, learning rate, etc., we refer to the default settings of the above methods and keep them consistent. Please refer to Appendix F for details. ... We use 16 batches of 64 samples each from the training set for calibration. The learning rates are set at 1e-3 for the rounding parameter and 4e-5 for the quantization scale of the activation layer. The rounding loss rate is set at 0.1, with 20,000 iterations per optimization block. The activation value drop probability is 50%. We gradually reduce the power β of the progressive soft function from 20 to 2.