Activation Map Compression through Tensor Decomposition for Deep Learning

Authors: Le-Trung Nguyen, Aël Quélennec, Enzo Tartaglione, Samuel Tardieu, Van-Tam Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint. In this section, we describe the experiments conducted to support the claims presented in Sec. 3. First, we introduce the setups used for our experiments (Sec. 4.1); then, we analyze the energy distribution in the different dimensions of HOSVD, providing an overview of the typical values of K (Sec. 4.2); finally, we test our algorithm in different setups, state-of-the-art architectures and datasets to evaluate the tradeoff between accuracy and memory footprint (Sec. 4.3).
Researcher Affiliation Academia LTCI, Télécom Paris, Institut Polytechnique de Paris {name.surname}@telecom-paris.fr
Pseudocode No The paper describes mathematical formulas and derivations in detail, particularly in Section 3 and Appendix A.3, but it does not present a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code: https://github.com/Le-Trung Nguyen/Neur IPS2024-Activation Compression.git
Open Datasets Yes We load models pre-trained on Image Net [21] and we fine-tune them on a variety of downstream datasets (CIFAR-10, CIFAR-100, CUB [46], Flowers [32] and Pets [54]). each classification dataset (Image Net, CIFAR-10/100) is split into two non-i.i.d. partitions of equal size using the Fed Avg [29] method. Semantic Segmentation... fine-tune models pretrained on Cityscapes [6] by MMSegmentation [5]. Here there is only one downstream dataset which is Pascal-VOC12 [3].
Dataset Splits Yes each classification dataset (Image Net, CIFAR-10/100) is split into two non-i.i.d. partitions of equal size using the Fed Avg [29] method. The partitions are then split as follows: 80% for training and 20% for validation.
Hardware Specification Yes Experiments were performed using a NVIDIA RTX 3090Ti
Software Dependencies Yes the source code uses Py Torch 1.13.1
Experiment Setup Yes Specifically, we finetune the checkpoints for 90 epochs with L2 gradient clipping with a threshold of 2.0. We use SGD with a weight decay of 1 10 4 and a momentum of 0.9. The data is randomly resized, randomly flipped, normalized, and divided into batches of 64 elements. We use cross-entropy as the loss function. the learning rate increases linearly over 4 warm-up epochs up to 0.005. the learning rate decays according to the cosine annealing method. We set the batch size to 128.