Gradient $\ell_1$ Regularization for Quantization Robustness
Authors: Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and Image Net datasets. |
| Researcher Affiliation | Collaboration | Milad Alizadeh 2,1, Arash Behboodi1, Mart van Baalen1, Christos Louizos1, Tijmen Blankevoort1, and Max Welling1 1Qualcomm AI Research Qualcomm Technologies Netherlands B.V. {behboodi,mart,clouizos,tijmen,mwelling}@qti.qualcomm.com 2University of Oxford milad.alizadeh@cs.ox.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific repository link, explicit code release statement, or mention of code in supplementary materials for the methodology described. |
| Open Datasets | Yes | We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and Image Net datasets. |
| Dataset Splits | No | The paper uses CIFAR-10 and ImageNet datasets, which typically have standard splits, but it does not explicitly provide specific details (percentages, counts, or explicit statements of using standard splits) for the training, validation, and test dataset partitioning. |
| Hardware Specification | Yes | The training was performed on a single NVIDIA RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions using PyTorch but does not specify its version or provide version numbers for other software dependencies. |
| Experiment Setup | Yes | We use uniform symmetric quantization (Jacob et al., 2018; Krishnamoorthi, 2018) in all our experiments unless explicitly specified otherwise. For the CIFAR 10 experiments we fix the activation bit-widths to 4 bits and then vary the weight bits from 8 to 4. For the Imagenet experiments we use the same bit-width for both weights and activations. [...] we use a fixed weight decay of 1e 4. We use a grid-search to find the best setting for λ. [...] We therefore only enable regularization in the last 15 epochs of training or as an additional fine-tuning phase. |