Initialization and Regularization of Factorized Neural Layers
Authors: Mikhail Khodak, Neil A. Tenenholtz, Lester Mackey, Nicolo Fusi
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we highlight the benefits of spectral initialization and Frobenius decay across a variety of settings. In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks |
| Researcher Affiliation | Collaboration | Mikhail Khodak Carnegie Mellon University khodak@cmu.edu Neil Tenenholtz, Lester Mackey, Nicol o Fusi Microsoft Research {netenenh,lmackey,fusi}@microsoft.com |
| Pseudocode | No | The paper describes methods in prose and mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce our results is available here: https://github.com/microsoft/fnl_paper. |
| Open Datasets | Yes | In Table 1 we see that the low-rank approach, with SI & FD, dominates at the higher memory settings of Res Net across all three datasets considered, often outperforming even approaches that train an uncompressed model first. It is also close to the best compressed training approach in the lowest memory setting for CIFAR-100 (Krizhevksy, 2009) and Tiny-Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly describe the methodology for creating validation splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'Py Torch' and refers to several GitHub repositories for code, but it does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | All models are trained for 200 epochs with the same optimizer settings as for the unfactorized models; the weight-decay coefficient is left unchanged when replacing by FD. and we use a warmup epoch with a 10 times smaller learning rate for Res Net56 for stability. |