Local and Global Convergence of General Burer-Monteiro Tensor Optimizations

Authors: Shuang Li, Qiuwei Li10266-10274

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Numerical Experiments Computing Infrastructure All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a. In the first experiment, we illustrate the linear convergence of the gradient descent algorithm within the contraction region dist(U 0, U ) 0.07 m / (M c ω3) in solving the tensor decomposition problem (6), where M = m = 1 in this case. We set n = 64 and vary r with three different values: n/2, n, 3n/2 to get an undercomplete, complete, and overcomplete target tensor T . We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = UUU. According to (Anandkumar, Ge, and Janzamin 2015, Lemmas 25, 31) and Corollary 2, if dist(U 0, U ) 0.07 m / (M c ω3) = 0.07 (because u i 2 = 1 implies c = c = ω = 1), the gradient descent with a sufficiently small constant stepsize would converge linearly to the true factor U . To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1, which is consistent with the linear convergence analysis of gradient descent on general Burer-Monteiro tensor optimizations in Corollary 2. In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each fixed (r, α)-pair and declare one successful instance if the final iterate U satisfies dist(U, U ) 10 3. We repeat these experiments for different α {0.5, 1, 2, 4, 8, 16}. Table 1 shows that when α is small enough (α 2), the success rate is 100% for all the undercomplete (r = n/2), complete (r = n), and overcomplete (r = 3n/2) cases; and when α is comparatively large (α [4, 8]), the success rate degrades dramatically when r increases. Finally, when α is larger than certain threshold, the success rate is 0%. This in consistence with Corollaries 1 and 2.
Researcher Affiliation Collaboration Shuang Li,1 Qiuwei Li 2 1 Department of Mathematics, University of California, Los Angeles 2 Alibaba Group (US), Damo Academy shuangli@math.ucla.edu, liqiuweiss@gmail.com
Pseudocode Yes Algorithm 1: Iterative Gradient Descent for Tensor Decomposition Input: T Initialization: T = T , bu = 0 Output: Estimated factors {u i } 1: Let i = 0. 2: while T = 0 do 3: if bu = 0 then 4: i = i + 1. 5: u i = bu 6: T T T,bu bu bu bu bu bu bu 3 2 7: end if 8: Find a second-order stationary point bu of g(u) = u u u T 2 F . 9: end while 10: return solution
Open Source Code No The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets No The paper states 'We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = U U U'. This indicates synthetic data generation, and no concrete access information for a publicly available or open dataset is provided.
Dataset Splits No The paper describes experiments where data is synthetically generated and metrics are recorded over iterations and trials. It does not provide specific dataset split information (percentages, counts, or predefined splits) for training, validation, or testing.
Hardware Specification Yes All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a.
Software Dependencies Yes MATLAB version of R2020a.
Experiment Setup Yes In the first experiment, we illustrate the linear convergence of the gradient descent algorithm... We set n = 64 and vary r with three different values: n/2, n, 3n/2... To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1... In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each fixed (r, α)-pair... Table 1: Success ratio with η = 0.02 (top), η = 0.04 (middle), and η = 0.06 (bottom).