Local and Global Convergence of General Burer-Monteiro Tensor Optimizations
Authors: Shuang Li, Qiuwei Li10266-10274
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical Experiments Computing Infrastructure All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a. In the first experiment, we illustrate the linear convergence of the gradient descent algorithm within the contraction region dist(U 0, U ) 0.07 m / (M c ω3) in solving the tensor decomposition problem (6), where M = m = 1 in this case. We set n = 64 and vary r with three different values: n/2, n, 3n/2 to get an undercomplete, complete, and overcomplete target tensor T . We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = UUU. According to (Anandkumar, Ge, and Janzamin 2015, Lemmas 25, 31) and Corollary 2, if dist(U 0, U ) 0.07 m / (M c ω3) = 0.07 (because u i 2 = 1 implies c = c = ω = 1), the gradient descent with a sufficiently small constant stepsize would converge linearly to the true factor U . To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1, which is consistent with the linear convergence analysis of gradient descent on general Burer-Monteiro tensor optimizations in Corollary 2. In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each fixed (r, α)-pair and declare one successful instance if the final iterate U satisfies dist(U, U ) 10 3. We repeat these experiments for different α {0.5, 1, 2, 4, 8, 16}. Table 1 shows that when α is small enough (α 2), the success rate is 100% for all the undercomplete (r = n/2), complete (r = n), and overcomplete (r = 3n/2) cases; and when α is comparatively large (α [4, 8]), the success rate degrades dramatically when r increases. Finally, when α is larger than certain threshold, the success rate is 0%. This in consistence with Corollaries 1 and 2. |
| Researcher Affiliation | Collaboration | Shuang Li,1 Qiuwei Li 2 1 Department of Mathematics, University of California, Los Angeles 2 Alibaba Group (US), Damo Academy shuangli@math.ucla.edu, liqiuweiss@gmail.com |
| Pseudocode | Yes | Algorithm 1: Iterative Gradient Descent for Tensor Decomposition Input: T Initialization: T = T , bu = 0 Output: Estimated factors {u i } 1: Let i = 0. 2: while T = 0 do 3: if bu = 0 then 4: i = i + 1. 5: u i = bu 6: T T T,bu bu bu bu bu bu bu 3 2 7: end if 8: Find a second-order stationary point bu of g(u) = u u u T 2 F . 9: end while 10: return solution |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology. |
| Open Datasets | No | The paper states 'We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = U U U'. This indicates synthetic data generation, and no concrete access information for a publicly available or open dataset is provided. |
| Dataset Splits | No | The paper describes experiments where data is synthetically generated and metrics are recorded over iterations and trials. It does not provide specific dataset split information (percentages, counts, or predefined splits) for training, validation, or testing. |
| Hardware Specification | Yes | All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a. |
| Software Dependencies | Yes | MATLAB version of R2020a. |
| Experiment Setup | Yes | In the first experiment, we illustrate the linear convergence of the gradient descent algorithm... We set n = 64 and vary r with three different values: n/2, n, 3n/2... To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1... In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each fixed (r, α)-pair... Table 1: Success ratio with η = 0.02 (top), η = 0.04 (middle), and η = 0.06 (bottom). |