Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Authors: Noam Razin, Nadav Cohen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate the analysis with empirical demonstrations. Our experiments show that in analogy with matrix factorization, gradient descent on a tensor factorization tends to produce solutions with low rank, where rank is defined in the context of tensors. Similarly to how matrix factorization corresponds to a linear neural network whose input-output mapping is represented by a matrix, it is known (see [22]) that tensor factorization corresponds to a convolutional arithmetic circuit (certain type of non-linear neural network) whose input-output mapping is represented by a tensor. We thus obtain a second exemplar of a neural network architecture whose implicit regularization strives to lower a notion of rank for its input-output mapping. This leads us to believe that the phenomenon may be general, and formalizing notions of rank for input-output mappings of contemporary models may be key to explaining generalization in deep learning. The remainder of the paper is organized as follows. Section 2 presents the deep matrix factorization model. Section 3 delivers our analysis, showing that its implicit regularization can drive all norms to infinity. Experiments, with both the analyzed setting and tensor factorization, are given in Section 4.
Researcher Affiliation Academia Noam Razin Tel Aviv University noam.razin@cs.tau.ac.il Nadav Cohen Tel Aviv University cohennadav@cs.tau.ac.il
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets No The paper defines a specific synthetic matrix completion problem with observations: 'Ω= {(1, 2), (2, 1), (2, 2)} , b1,2 = 1 , b2,1 = 1 , b2,2 = 0.' This is not a publicly available dataset in the conventional sense that would require access information.
Dataset Splits No The paper defines a synthetic matrix completion problem and analyzes implicit regularization. It does not describe standard training, validation, or test splits for a pre-existing dataset. The training is performed on 'observed entries', and generalization is evaluated on 'unobserved entries', which is inherent to the matrix completion task, but not a typical train/validation/test split description.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory).
Software Dependencies No The paper mentions 'pytorch' as a reference [68] but does not list specific software dependencies with version numbers used for their own implementation.
Experiment Setup Yes Independently for each depth, runs were iteratively carried out, with both learning rate and standard deviation for initialization decreased after each run, until the point where further reduction did not yield a noticeable change (presented runs are those from the last iterations of this process). ... For gradient descent over tensor factorization, we employed an adaptive learning rate scheme to reduce run times (see Subappendix F.2 for details), and iteratively ran with decreasing standard deviation for initialization, until the point at which further reduction did not yield a noticeable change (presented results are those from the last iterations of this process, with the corresponding standard deviations annotated by init ).