reproducibilityindex.ai

Local and Global Convergence of General Burer-Monteiro Tensor Optimizations

Authors: Shuang Li, Qiuwei Li10266-10274

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Numerical Experiments Computing Infrastructure All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a. In the ﬁrst experiment, we illustrate the linear convergence of the gradient descent algorithm within the contraction region dist(U 0, U ) 0.07 m / (M c ω3) in solving the tensor decomposition problem (6), where M = m = 1 in this case. We set n = 64 and vary r with three different values: n/2, n, 3n/2 to get an undercomplete, complete, and overcomplete target tensor T . We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = UUU. According to (Anandkumar, Ge, and Janzamin 2015, Lemmas 25, 31) and Corollary 2, if dist(U 0, U ) 0.07 m / (M c ω3) = 0.07 (because u i 2 = 1 implies c = c = ω = 1), the gradient descent with a sufﬁciently small constant stepsize would converge linearly to the true factor U . To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1, which is consistent with the linear convergence analysis of gradient descent on general Burer-Monteiro tensor optimizations in Corollary 2. In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each ﬁxed (r, α)-pair and declare one successful instance if the ﬁnal iterate U satisﬁes dist(U, U ) 10 3. We repeat these experiments for different α {0.5, 1, 2, 4, 8, 16}. Table 1 shows that when α is small enough (α 2), the success rate is 100% for all the undercomplete (r = n/2), complete (r = n), and overcomplete (r = 3n/2) cases; and when α is comparatively large (α [4, 8]), the success rate degrades dramatically when r increases. Finally, when α is larger than certain threshold, the success rate is 0%. This in consistence with Corollaries 1 and 2.
Researcher Affiliation	Collaboration	Shuang Li,1 Qiuwei Li 2 1 Department of Mathematics, University of California, Los Angeles 2 Alibaba Group (US), Damo Academy shuangli@math.ucla.edu, liqiuweiss@gmail.com
Pseudocode	Yes	Algorithm 1: Iterative Gradient Descent for Tensor Decomposition Input: T Initialization: T = T , bu = 0 Output: Estimated factors {u i } 1: Let i = 0. 2: while T = 0 do 3: if bu = 0 then 4: i = i + 1. 5: u i = bu 6: T T T,bu bu bu bu bu bu bu 3 2 7: end if 8: Find a second-order stationary point bu of g(u) = u u u T 2 F . 9: end while 10: return solution
Open Source Code	No	The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology.
Open Datasets	No	The paper states 'We generate the r columns of U independently according to the uniform distribution on the unit sphere and form T = U U U'. This indicates synthetic data generation, and no concrete access information for a publicly available or open dataset is provided.
Dataset Splits	No	The paper describes experiments where data is synthetically generated and metrics are recorded over iterations and trials. It does not provide specific dataset split information (percentages, counts, or predefined splits) for training, validation, or testing.
Hardware Specification	Yes	All the numerical experiments are performed on a 2018 Mac Book Pro with operating system of mac OS version 10.15.7, processor of 2.6 GHz 6-Core Intel Core i7, memory of 32 GB, and MATLAB version of R2020a.
Software Dependencies	Yes	MATLAB version of R2020a.
Experiment Setup	Yes	In the ﬁrst experiment, we illustrate the linear convergence of the gradient descent algorithm... We set n = 64 and vary r with three different values: n/2, n, 3n/2... To illustrate this, we initialize the starting point as U + αD with α = 0.07 and set D as a normalized Gaussian matrix with D F = 1. We record the three metrics f(U) F , U U U T F , and dist(U, U ) for total 103 iterations with different stepsizes η in Figure 1... In the second experiment, with the same settings as above except varying α, we record the success rate by running 100 trials for each ﬁxed (r, α)-pair... Table 1: Success ratio with η = 0.02 (top), η = 0.04 (middle), and η = 0.06 (bottom).