Sharpness-Aware Data Generation for Zero-shot Quantization

Authors: Hoang Anh Dung, Cuong Pham, Trung Le, Jianfei Cai, Thanh-Toan Do

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental evaluations on CIFAR-100 and Image Net datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.
Researcher Affiliation Academia Department of Data Science and AI, Monash University, Melbourne, Australia.
Pseudocode Yes Algorithm 1 SA zero-shot quantization.
Open Source Code No The paper states that Genie's official released code was used to produce some results, but does not explicitly state that the code for SADAG (their method) is released or provide a link.
Open Datasets Yes We evaluate our approach on CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) datasets, which are commonly utilized for zero-shot quantization.
Dataset Splits No The paper states the use of CIFAR-100 and Image Net datasets but does not explicitly provide specific train/validation/test split percentages or sample counts for its experiments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running its experiments.
Software Dependencies No The paper mentions using the Adam optimizer and specific learning rate schedulers, but does not provide version numbers for any software components or libraries used.
Experiment Setup Yes The learning rates of the generator and embedding are initially set at 0.1 and 0.01, respectively. We adopt the Adam optimizer (Kingma & Ba, 2014) for both generator and data embedding, but utilize different schedulers for them, i.e., the Exponential LR scheduler and Reduce LRon Plateau scheduler are used for scheduling the learning rates of the generator and the embeddings, respectively. Across all experiments, the batch size for the data generation process is set to 128, while in the quantization step, we keep the batch size at 32. The threshold ζ in Eq. (19) is set to 0 or 0.1. The radius ν in Eq. (16) for the embedding perturbation is set to 2.