Robust low-rank training via approximate orthonormal constraints
Authors: Dayana Savostianova, Emanuele Zangrando, Gianluca Ceruti, Francesco Tudisco
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists. ... We provide several experimental evaluations on different architectures and datasets, where the robust low-rank networks are compared against a variety of baselines. |
| Researcher Affiliation | Academia | Dayana Savostianova Gran Sasso Science Institute 67100 L Aquila (Italy) dayana.savostianova@gssi.it Emanuele Zangrando Gran Sasso Science Institute 67100 L Aquila (Italy) emanuele.zangrando@gssi.it Gianluca Ceruti University of Innsbruck 6020 Innsbruck (Austria) gianluca.ceruti@uibk.ac.at Francesco Tudisco Gran Sasso Science Institute 67100 L Aquila (Italy) francesco.tudisco@gssi.it |
| Pseudocode | Yes | Algorithm 1: Pseudocode of robust well-Conditioned Low-Rank (Cond LR ) training scheme |
| Open Source Code | Yes | All the experiments can be reproduced with the code in Py Torch available at https://github.com/COMPi LELab/Cond LR. |
| Open Datasets | Yes | We consider MNIST , CIFAR10, and CIFAR100 [33] datasets for evaluation purposes. ... [33] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009. |
| Dataset Splits | No | The paper mentions "60,000 training images" and "10,000 test images" for MNIST, and similar counts for CIFAR10/100, but does not specify any explicit validation dataset splits (e.g., percentages or exact counts for a validation set). |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Py Torch" as the framework used for the code, but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Each method and model was trained for 120 epochs of stochastic gradient descent with a minibatch size of 128. We used a learning rate of 0.1 for Le Net5 and 0.05 for VGG16 with momentum 0.3 and 0.45, respectively, and a learning rate scheduler with factor = 0.3 at 70 and 100 epochs. |