Sharpness-Aware Data Generation for Zero-shot Quantization
Authors: Hoang Anh Dung, Cuong Pham, Trung Le, Jianfei Cai, Thanh-Toan Do
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations on CIFAR-100 and Image Net datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings. |
| Researcher Affiliation | Academia | Department of Data Science and AI, Monash University, Melbourne, Australia. |
| Pseudocode | Yes | Algorithm 1 SA zero-shot quantization. |
| Open Source Code | No | The paper states that Genie's official released code was used to produce some results, but does not explicitly state that the code for SADAG (their method) is released or provide a link. |
| Open Datasets | Yes | We evaluate our approach on CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) datasets, which are commonly utilized for zero-shot quantization. |
| Dataset Splits | No | The paper states the use of CIFAR-100 and Image Net datasets but does not explicitly provide specific train/validation/test split percentages or sample counts for its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or cloud computing instances) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and specific learning rate schedulers, but does not provide version numbers for any software components or libraries used. |
| Experiment Setup | Yes | The learning rates of the generator and embedding are initially set at 0.1 and 0.01, respectively. We adopt the Adam optimizer (Kingma & Ba, 2014) for both generator and data embedding, but utilize different schedulers for them, i.e., the Exponential LR scheduler and Reduce LRon Plateau scheduler are used for scheduling the learning rates of the generator and the embeddings, respectively. Across all experiments, the batch size for the data generation process is set to 128, while in the quantization step, we keep the batch size at 32. The threshold ζ in Eq. (19) is set to 0 or 0.1. The radius ν in Eq. (16) for the embedding perturbation is set to 2. |