Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates
Authors: Yuanzhi Li, Yingyu Liang, Andrej Risteski
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper proposes an algorithm that alternates between decoding the weights and updating the features, and shows that assuming a generative model of the data, it provably recovers the groundtruth under fairly mild conditions. The analysis relies on a carefully designed coupling between two potential functions, which we believe is of independent interest. |
| Researcher Affiliation | Academia | Yuanzhi Li, Yingyu Liang, Andrej Risteski Computer Science Department at Princeton University 35 Olden St, Princeton, NJ 08540 EMAIL |
| Pseudocode | Yes | Algorithm 1 Purification Input: initialization A(0), threshold α, step size η, scaling factor r, sample size N, iterations T 1: for t = 0, 1, 2, ..., T 1 do 2: Draw examples y1, . . . , y N. 3: (Decode) Compute A , the pseudo-inverse of A(t) with minimum (A) . Set x = φα(A y) for each example y. // φα is Re LU activation; see (2) for the definition 4: (Update) Update the feature matrix A(t+1) = (1 η) A(t) + rηˆE (y y )(x x ) where ˆE is over independent uniform y, y from {y1, . . . , y N}, and x, x are their decodings. Output: A = A(T ) |
| Open Source Code | No | The paper refers to a third-party software (lda-c) with a GitHub link, but does not state that the code for its own methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and assumes a generative model for data; it does not utilize or mention a specific dataset for training or evaluation. |
| Dataset Splits | No | The paper focuses on theoretical analysis and does not describe experimental validation or dataset splits for such validation. |
| Hardware Specification | No | The paper is theoretical and does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |