reproducibilityindex.ai

Initialization and Regularization of Factorized Neural Layers

Authors: Mikhail Khodak, Neil A. Tenenholtz, Lester Mackey, Nicolo Fusi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we highlight the beneﬁts of spectral initialization and Frobenius decay across a variety of settings. In model compression, we show that they enable low-rank methods to signiﬁcantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks
Researcher Affiliation	Collaboration	Mikhail Khodak Carnegie Mellon University khodak@cmu.edu Neil Tenenholtz, Lester Mackey, Nicol o Fusi Microsoft Research {netenenh,lmackey,fusi}@microsoft.com
Pseudocode	No	The paper describes methods in prose and mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce our results is available here: https://github.com/microsoft/fnl_paper.
Open Datasets	Yes	In Table 1 we see that the low-rank approach, with SI & FD, dominates at the higher memory settings of Res Net across all three datasets considered, often outperforming even approaches that train an uncompressed model ﬁrst. It is also close to the best compressed training approach in the lowest memory setting for CIFAR-100 (Krizhevksy, 2009) and Tiny-Image Net (Deng et al., 2009).
Dataset Splits	No	The paper mentions training and test sets but does not explicitly describe the methodology for creating validation splits, such as percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using 'Py Torch' and refers to several GitHub repositories for code, but it does not specify exact version numbers for any software dependencies.
Experiment Setup	Yes	All models are trained for 200 epochs with the same optimizer settings as for the unfactorized models; the weight-decay coefﬁcient is left unchanged when replacing by FD. and we use a warmup epoch with a 10 times smaller learning rate for Res Net56 for stability.