Implicit Regularization in Deep Matrix Factorization
Authors: Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our first finding, supported by theory and experiments, is that adding depth to a matrix factorization enhances an implicit tendency towards low-rank solutions, oftentimes leading to more accurate recovery. Secondly, we present theoretical and empirical arguments questioning a nascent view by which implicit regularization in matrix factorization can be captured using simple mathematical norms. |
| Researcher Affiliation | Academia | Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu Nadav Cohen Tel Aviv University cohennadav@cs.tau.ac.il Wei Hu Princeton University huwei@cs.princeton.edu Yuping Luo Princeton University yupingl@cs.princeton.edu |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code is provided or link to a code repository for the described methodology. |
| Open Datasets | Yes | Figure 3: Dynamics of gradient descent over deep matrix factorizations specifically, evolution of singular values and singular vectors of the product matrix during training for matrix completion. Top row corresponds to the task of completing a random rank-5 matrix with size 100 100 based on 2000 randomly chosen observed entries; bottom row corresponds to training on 10000 entries chosen randomly from the Movie Lens 100K dataset (completion of a 943 1682 matrix, cf. [24]). |
| Dataset Splits | No | The paper describes training on a portion of the data and testing on unobserved entries but does not specify a separate validation split or cross-validation setup with explicit percentages or counts for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Both learning rate and standard deviation of (random, zero-centered) initialization for gradient descent were set to the small value 10 3. Initializing with smaller standard deviation had no observable effect on results of depth 3 (and 4), but did impact those of depth 2 the outcomes of dividing standard deviation by 2 and by 4 are included in the plots. |