Convergence of Alternating Gradient Descent for Matrix Factorization

Authors: Rachel Ward, Tamara Kolda

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice.
Researcher Affiliation Collaboration Rachel Ward University of Texas Austin, TX rward@math.utexas.edu Tamara G. Kolda Math Sci.ai Dublin, CA tammy.kolda@mathsci.ai
Pseudocode No The paper describes the alternating gradient descent update rule in Assumption 1 but does not present it as a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an unambiguous statement or a direct link to a source-code repository for the methodology described in this paper.
Open Datasets No The paper uses a synthetic matrix constructed as A = UΣV with U and V random 100 × 5 orthonormal matrices, but does not provide access, exact reproduction instructions (e.g., random seed), or a link to this specific matrix.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We factorize a rank-5 (r = 5) matrix of size 100 × 100. The matrix is constructed as A = UΣV with U and V random 100 × 5 orthonormal matrices, and singular value ratio σr(A)/σ1(A) = 0.9. ... In all experiments we use the defaults C = 4 and D = Cν/9 with ν = 1e-10 for computing the proposed initialization as well as the theoretical step size. ... Proposed: X0 = 1 ηd Cσ1 AΦ(n d) Y0 = ηDσ1 n Φ(n d)