reproducibilityindex.ai

Geometry-aware training of factorized layers in tensor Tucker format

Authors: Emanuele Zangrando, Steffen Schotthöfer, Gianluca Ceruti, Jonas Kusch, Francesco Tudisco

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The method s performance is further illustrated through a variety of experiments, showing remarkable training compression rates and comparable or even better performance than the full baseline and alternative layer factorization strategies.4 Experiments In the following, we conduct a series of experiments to evaluate the performance of the proposed method as compared to the full model and to standard layer factorization and model pruning baselines.
Researcher Affiliation	Academia	Emanuele Zangrando School of Mathematics, Gran Sasso Science Institute, L Aquila, Italy emanuele.zangrando@gssi.it Steffen Schotthöfer Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA schotthoefers@ornl.gov Gianluca Ceruti Department of Mathematics, University of Innsbruck, Innsbruck, Austria gianluca.ceruti@uibk.ac.at Jonas Kusch Department of Data Science, Norwegian University of Life Sciences, Ås, Norway jonas.kusch@nmbu.no Francesco Tudisco School of Mathematics and Maxwell Institute, University of Edinburgh, Edinburgh, UK; School of Mathematics, Gran Sasso Science Institute, L Aquila, Italy f.tudisco@ed.ac.uk
Pseudocode	Yes	Algorithm 1: TDLRT: Efficient Tensor Dynamical Low-Rank Training in Tucker format.Algorithm 2: TDLRT: Standard Dynamical Low-Rank Training of convolutions in Tucker format.
Open Source Code	Yes	The code is available in the supplementary material.
Open Datasets	Yes	The compression performance of TDLRT is evaluated on CIFAR10 and tiny-imagenet.we show in Figure 2 the accuracy history of Le Net5 on MNIST using TDLRT as compared to standard training on Tucker and CP decompositions.
Dataset Splits	Yes	All methods are trained using a batch size of 128 for 70 epochs each, as done in [79, 36]. All the baseline methods are trained with the SGD optimizer; the starting learning rate of 0.05 is reduced by a factor of 10 on plateaus, and momentum is chosen as 0.1 for all layers.
Hardware Specification	Yes	The experiments are performed on an Nvidia RTX3090, Nvidia RTX3070 and one Nvidia A100 80GB.
Software Dependencies	No	No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow) are mentioned, only the type of optimizer (SGD).
Experiment Setup	Yes	All methods are trained using a batch size of 128 for 70 epochs each, as done in [79, 36]. All the baseline methods are trained with the SGD optimizer; the starting learning rate of 0.05 is reduced by a factor of 10 on plateaus, and momentum is chosen as 0.1 for all layers.