Geometry-aware training of factorized layers in tensor Tucker format
Authors: Emanuele Zangrando, Steffen Schotthöfer, Gianluca Ceruti, Jonas Kusch, Francesco Tudisco
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The method s performance is further illustrated through a variety of experiments, showing remarkable training compression rates and comparable or even better performance than the full baseline and alternative layer factorization strategies.4 Experiments In the following, we conduct a series of experiments to evaluate the performance of the proposed method as compared to the full model and to standard layer factorization and model pruning baselines. |
| Researcher Affiliation | Academia | Emanuele Zangrando School of Mathematics, Gran Sasso Science Institute, L Aquila, Italy emanuele.zangrando@gssi.it Steffen Schotthöfer Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA schotthoefers@ornl.gov Gianluca Ceruti Department of Mathematics, University of Innsbruck, Innsbruck, Austria gianluca.ceruti@uibk.ac.at Jonas Kusch Department of Data Science, Norwegian University of Life Sciences, Ås, Norway jonas.kusch@nmbu.no Francesco Tudisco School of Mathematics and Maxwell Institute, University of Edinburgh, Edinburgh, UK; School of Mathematics, Gran Sasso Science Institute, L Aquila, Italy f.tudisco@ed.ac.uk |
| Pseudocode | Yes | Algorithm 1: TDLRT: Efficient Tensor Dynamical Low-Rank Training in Tucker format.Algorithm 2: TDLRT: Standard Dynamical Low-Rank Training of convolutions in Tucker format. |
| Open Source Code | Yes | The code is available in the supplementary material. |
| Open Datasets | Yes | The compression performance of TDLRT is evaluated on CIFAR10 and tiny-imagenet.we show in Figure 2 the accuracy history of Le Net5 on MNIST using TDLRT as compared to standard training on Tucker and CP decompositions. |
| Dataset Splits | Yes | All methods are trained using a batch size of 128 for 70 epochs each, as done in [79, 36]. All the baseline methods are trained with the SGD optimizer; the starting learning rate of 0.05 is reduced by a factor of 10 on plateaus, and momentum is chosen as 0.1 for all layers. |
| Hardware Specification | Yes | The experiments are performed on an Nvidia RTX3090, Nvidia RTX3070 and one Nvidia A100 80GB. |
| Software Dependencies | No | No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow) are mentioned, only the type of optimizer (SGD). |
| Experiment Setup | Yes | All methods are trained using a batch size of 128 for 70 epochs each, as done in [79, 36]. All the baseline methods are trained with the SGD optimizer; the starting learning rate of 0.05 is reduced by a factor of 10 on plateaus, and momentum is chosen as 0.1 for all layers. |