FIT: A Metric for Model Sensitivity

Authors: Ben Zandonati, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These properties are validated experimentally across hundreds of quantization configurations, with a focus on layer-wise mixed-precision quantization.
Researcher Affiliation Collaboration Ben Zandonati University of Cambridge baz23@cam.ac.ukAdrian Alan Pol Princeton University ap6964@princeton.eduMaurizio Pierini CERN maurizio.pierini@cern.chOlya Sirkin CEVA Inc. sirkinolya@gmail.comTal Kopetz CEVA Inc. tal.kopetz@ceva-dsp.com
Pseudocode No The paper describes computational steps and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We have included sample code for generating the parameter and activation traces, as well as generating and analysing quantized models.
Open Datasets Yes To evaluate trace performance and convergence in comparison to Hessian-based methods, we consider several computer vision architectures, trained on the Image Net (Deng et al., 2009) dataset. [...] 100 CNN models, with and without batch-normalisation, are trained on the Cifar-10 and Mnist datasets [...] We choose to quantify the effects of MPQ on the U-Net architecture (Ronneberger et al., 2015), for the Cityscapes semantic segmentation dataset (Cordts et al., 2016).
Dataset Splits No While the paper mentions training and testing, it does not explicitly specify train/validation/test dataset splits (e.g., percentages or exact sample counts) for reproducibility across all experiments.
Hardware Specification Yes The measurements were performed on an NVidia 2080Ti GPU. [...] O(100) hours of computation was performed on an RTX 2080 Ti (TDP of 250W), using a private infrastructure which has a carbon efficiency of 0.432 kg CO2eq/k Wh.
Software Dependencies No The paper mentions software components like the Adam optimizer but does not specify version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch), or other key software dependencies required for replication.
Experiment Setup Yes To obtain the data, we first trained a full precision version of the network for 50 epochs using the Adam optimizer. A learning rate of 0.01 was chosen, and increased to 0.1 with the inclusion of batch normalization. A cosine-annealing learning rate schedule was used. We then used this trained full precision model as a checkpoint to initialise our randomly chosen mixed precision configurations, and training was continued for another 30 epochs with a learning rate reduction of 0.1, using the same schedule. Quantization configurations were chosen uniformly at random from the possible set of bit precisions: [8,6,4,3].