Activation Map Compression through Tensor Decomposition for Deep Learning
Authors: Le-Trung Nguyen, Aël Quélennec, Enzo Tartaglione, Samuel Tardieu, Van-Tam Nguyen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint. In this section, we describe the experiments conducted to support the claims presented in Sec. 3. First, we introduce the setups used for our experiments (Sec. 4.1); then, we analyze the energy distribution in the different dimensions of HOSVD, providing an overview of the typical values of K (Sec. 4.2); finally, we test our algorithm in different setups, state-of-the-art architectures and datasets to evaluate the tradeoff between accuracy and memory footprint (Sec. 4.3). |
| Researcher Affiliation | Academia | LTCI, Télécom Paris, Institut Polytechnique de Paris {name.surname}@telecom-paris.fr |
| Pseudocode | No | The paper describes mathematical formulas and derivations in detail, particularly in Section 3 and Appendix A.3, but it does not present a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code: https://github.com/Le-Trung Nguyen/Neur IPS2024-Activation Compression.git |
| Open Datasets | Yes | We load models pre-trained on Image Net [21] and we fine-tune them on a variety of downstream datasets (CIFAR-10, CIFAR-100, CUB [46], Flowers [32] and Pets [54]). each classification dataset (Image Net, CIFAR-10/100) is split into two non-i.i.d. partitions of equal size using the Fed Avg [29] method. Semantic Segmentation... fine-tune models pretrained on Cityscapes [6] by MMSegmentation [5]. Here there is only one downstream dataset which is Pascal-VOC12 [3]. |
| Dataset Splits | Yes | each classification dataset (Image Net, CIFAR-10/100) is split into two non-i.i.d. partitions of equal size using the Fed Avg [29] method. The partitions are then split as follows: 80% for training and 20% for validation. |
| Hardware Specification | Yes | Experiments were performed using a NVIDIA RTX 3090Ti |
| Software Dependencies | Yes | the source code uses Py Torch 1.13.1 |
| Experiment Setup | Yes | Specifically, we finetune the checkpoints for 90 epochs with L2 gradient clipping with a threshold of 2.0. We use SGD with a weight decay of 1 10 4 and a momentum of 0.9. The data is randomly resized, randomly flipped, normalized, and divided into batches of 64 elements. We use cross-entropy as the loss function. the learning rate increases linearly over 4 warm-up epochs up to 0.005. the learning rate decays according to the cosine annealing method. We set the batch size to 128. |