Characterizing the Loss Landscape in Non-Negative Matrix Factorization
Authors: Johan Bjorck, Anmol Kabra, Kilian Q. Weinberger, Carla Gomes6768-6776
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that such a property holds with high probability for NMF, provably in a non-worst case model with a planted solution, and empirically across an extensive suite of real-world NMF problems spanning collaborative filtering, scientific analysis, and image analysis. Our analysis predicts that this property becomes more likely with a growing number of parameters, and experiments suggest that a similar trend might also hold for deep neural networks turning increasing dataset sizes and model sizes into a blessing from an optimization perspective. |
| Researcher Affiliation | Academia | Johan Bjorck, Anmol Kabra, Kilian Q. Weinberger, Carla P. Gomes Cornell University {njb225,ak2426,kqw4,gomes}@cornell.edu |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In Table 1, we list these datasets together with their sparsity. movielens movie ratings p3953, 6041, 20q 0.0419 (Harper and Konstan 2016), netflix movie/tv-show ratings p47928, 8963, 20q 0.0121 (Zhou et al. 2008), goodbooks book ratings p10000, 43461, 50q 0.0022 (Kula 2017). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment. It mentions computing 'loss only over observed entries' but not explicit splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch (implicitly, through Resnet references) but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | For simplicity, we use the same learning rate of 1e 5 for all datasets and run gradient descent until the rate of relative improvement in the loss falls below 1e 7. We initialize decomposition matrices using the half-normal distribution, which is scaled so that the mean matches with that of the dataset. To enable comparison between datasets, we scale all data matrices so that the variance of observed entries is one, and divide the loss function by the number of (observed) entries. |