FIT: A Metric for Model Sensitivity
Authors: Ben Zandonati, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These properties are validated experimentally across hundreds of quantization configurations, with a focus on layer-wise mixed-precision quantization. |
| Researcher Affiliation | Collaboration | Ben Zandonati University of Cambridge baz23@cam.ac.ukAdrian Alan Pol Princeton University ap6964@princeton.eduMaurizio Pierini CERN maurizio.pierini@cern.chOlya Sirkin CEVA Inc. sirkinolya@gmail.comTal Kopetz CEVA Inc. tal.kopetz@ceva-dsp.com |
| Pseudocode | No | The paper describes computational steps and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We have included sample code for generating the parameter and activation traces, as well as generating and analysing quantized models. |
| Open Datasets | Yes | To evaluate trace performance and convergence in comparison to Hessian-based methods, we consider several computer vision architectures, trained on the Image Net (Deng et al., 2009) dataset. [...] 100 CNN models, with and without batch-normalisation, are trained on the Cifar-10 and Mnist datasets [...] We choose to quantify the effects of MPQ on the U-Net architecture (Ronneberger et al., 2015), for the Cityscapes semantic segmentation dataset (Cordts et al., 2016). |
| Dataset Splits | No | While the paper mentions training and testing, it does not explicitly specify train/validation/test dataset splits (e.g., percentages or exact sample counts) for reproducibility across all experiments. |
| Hardware Specification | Yes | The measurements were performed on an NVidia 2080Ti GPU. [...] O(100) hours of computation was performed on an RTX 2080 Ti (TDP of 250W), using a private infrastructure which has a carbon efficiency of 0.432 kg CO2eq/k Wh. |
| Software Dependencies | No | The paper mentions software components like the Adam optimizer but does not specify version numbers for programming languages (e.g., Python), libraries (e.g., PyTorch), or other key software dependencies required for replication. |
| Experiment Setup | Yes | To obtain the data, we first trained a full precision version of the network for 50 epochs using the Adam optimizer. A learning rate of 0.01 was chosen, and increased to 0.1 with the inclusion of batch normalization. A cosine-annealing learning rate schedule was used. We then used this trained full precision model as a checkpoint to initialise our randomly chosen mixed precision configurations, and training was continued for another 30 epochs with a learning rate reduction of 0.1, using the same schedule. Quantization configurations were chosen uniformly at random from the possible set of bit precisions: [8,6,4,3]. |