reproducibilityindex.ai

Gradient $\ell_1$ Regularization for Quantization Robustness

Authors: Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and Image Net datasets.
Researcher Affiliation	Collaboration	Milad Alizadeh 2,1, Arash Behboodi1, Mart van Baalen1, Christos Louizos1, Tijmen Blankevoort1, and Max Welling1 1Qualcomm AI Research Qualcomm Technologies Netherlands B.V. {behboodi,mart,clouizos,tijmen,mwelling}@qti.qualcomm.com 2University of Oxford milad.alizadeh@cs.ox.ac.uk
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a specific repository link, explicit code release statement, or mention of code in supplementary materials for the methodology described.
Open Datasets	Yes	We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and Image Net datasets.
Dataset Splits	No	The paper uses CIFAR-10 and ImageNet datasets, which typically have standard splits, but it does not explicitly provide specific details (percentages, counts, or explicit statements of using standard splits) for the training, validation, and test dataset partitioning.
Hardware Specification	Yes	The training was performed on a single NVIDIA RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions using PyTorch but does not specify its version or provide version numbers for other software dependencies.
Experiment Setup	Yes	We use uniform symmetric quantization (Jacob et al., 2018; Krishnamoorthi, 2018) in all our experiments unless explicitly speciﬁed otherwise. For the CIFAR 10 experiments we ﬁx the activation bit-widths to 4 bits and then vary the weight bits from 8 to 4. For the Imagenet experiments we use the same bit-width for both weights and activations. [...] we use a ﬁxed weight decay of 1e 4. We use a grid-search to ﬁnd the best setting for λ. [...] We therefore only enable regularization in the last 15 epochs of training or as an additional ﬁne-tuning phase.