Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning

Authors: Le-Trung Nguyen, Aël Quélennec, Van-Tam Nguyen, Enzo Tartaglione

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis and experiments demonstrate that our method can reduce activation memory usage, even up to 120.09 compared to vanilla training, while also reducing overall training FLOPs up to 1.86 when evaluated on traditional benchmarks. [...] We demonstrate the effectiveness of our method through various experiments, including Image Net-1k, and performing real measurements on a resource-limited device like a Raspberry Pi 5 (Sec. 4).
Researcher Affiliation Academia 1LTCI, T el ecom Paris, Institut Polytechnique de Paris, France. Correspondence to: Le-Trung Nguyen <EMAIL>, Enzo Tartaglione <EMAIL>.
Pseudocode Yes Algorithm 1 ASI for layer i with set of rank ri Ropt Activation map A(t) i RB Ci Hi Wi at epoch t. Target ranks for 4 modes ri N4 [1, min(ai,m, bi,m)], where (ai,m, bi,m) is shape of activation map A(t) i at mode m. Function:...
Open Source Code Yes The code is available at https://github.com/Le Trung Nguyen/ICML2025-ASI.git.
Open Datasets Yes To evaluate the effectiveness of ASI, we conduct classification tasks on six datasets: CIFAR-10, CIFAR100 (Krizhevsky, 2009), CUB (Wah et al., 2011), Flowers (Nilsback & Zisserman, 2008), Pets (Zhang et al., 2022), and Image Net (Deng et al., 2009).
Dataset Splits Yes The dataset is divided into two equal-sized, non-i.i.d. partitions using Fed Avg (Mc Mahan et al., 2017). Models are pretrained on the first partition, while the second partition is used for fine-tuning, with 80% allocated for training and the remaining 20% as the validation set. We fine-tune the provided checkpoints for 90 epochs, applying L2 gradient clipping with a threshold of 2.0.
Hardware Specification Yes All simulation experiments are conducted using Py Torch 1.13.1 on an NVIDIA Quadro RTX A4500 with 20GB of VRAM, while the on-device experiments are performed on a Raspberry Pi 5 with a Cortex-A76 CPU and 8GB RAM.
Software Dependencies Yes All simulation experiments are conducted using Py Torch 1.13.1 on an NVIDIA Quadro RTX A4500 with 20GB of VRAM, while the on-device experiments are performed on a Raspberry Pi 5 with a Cortex-A76 CPU and 8GB RAM.
Experiment Setup Yes The learning rate increases linearly over four warm-up epochs, reaching 0.005, and then follows a cosine annealing decay schedule. Momentum is fixed at 0, and the weight decay remains at 1 10 4. Additionally, L2 gradient clipping with a threshold of 2.0 is applied. Data augmentation techniques include random resizing, flipping, normalization, and mini-batching with a size of 64. The loss function is cross-entropy.