Convergence of Alternating Gradient Descent for Matrix Factorization
Authors: Rachel Ward, Tamara Kolda
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. |
| Researcher Affiliation | Collaboration | Rachel Ward University of Texas Austin, TX rward@math.utexas.edu Tamara G. Kolda Math Sci.ai Dublin, CA tammy.kolda@mathsci.ai |
| Pseudocode | No | The paper describes the alternating gradient descent update rule in Assumption 1 but does not present it as a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an unambiguous statement or a direct link to a source-code repository for the methodology described in this paper. |
| Open Datasets | No | The paper uses a synthetic matrix constructed as A = UΣV with U and V random 100 × 5 orthonormal matrices, but does not provide access, exact reproduction instructions (e.g., random seed), or a link to this specific matrix. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We factorize a rank-5 (r = 5) matrix of size 100 × 100. The matrix is constructed as A = UΣV with U and V random 100 × 5 orthonormal matrices, and singular value ratio σr(A)/σ1(A) = 0.9. ... In all experiments we use the defaults C = 4 and D = Cν/9 with ν = 1e-10 for computing the proposed initialization as well as the theoretical step size. ... Proposed: X0 = 1 ηd Cσ1 AΦ(n d) Y0 = ηDσ1 n Φ(n d) |