reproducibilityindex.ai

Fast Hybrid Algorithm for Big Matrix Recovery

Authors: Tengfei Zhou, Hui Qian, Zebang Shen, Congfu Xu

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments on large-scale collaborative ﬁltering datasets demonstrate very competitive performance of these fast hybrid methods compared to the state-of-the-arts. We validate the performance of our hybrid methods by conducting empirical study on synthetic and real matrix recovery tasks.
Researcher Affiliation	Academia	Tengfei Zhou, Hui Qian , Zebang Shen, Congfu Xu College of Computer Science and Technology, Zhejiang University, China {zhoutengfei,qianhui,shenzebang,xucongfu}@zju.edu.cn
Pseudocode	Yes	Algorithm 1 NT: Riemannian Newton method for problem (9) Input: Rank r, the polar representation X(0) of matrix X0. Output: Polar representation of local optimum. 1: k = 0 2: repeat 3: Solve the following linear equations of ζ(k) by t CG ΠX(k) ζ(k)grad F(X(k)) = grad F(X(k)) ζ(k) HX(k) (12) 4: Set X(k+1) = RX(k)(ζ(k)) 5: k=k+1 6: until convergence 7: return X(k). Algorithm 2 CG: Riemannian conjugate gradient descent for problem (9). Algorithm 3 HADMNT (or HADMCG) for problem (1)
Open Source Code	No	The paper states: "The codes of baselines except for ADMM are download from the homepages of their authors." This refers to third-party code for comparison, not the authors' own implementation described in the paper. No statement or link is provided for the authors' code.
Open Datasets	Yes	Three largest public avail- Dataset users Items Ratings ML10M 69,878 10,677 10,000,054 Net Flix 2,649,429 17,770 100,480,507 Yahoo 1,000,990 624,961 252,800,275 Table 2: Statistics of datasets. Movielens 10M (ML10M) (Herlocker et al. 1999), Net Flix (KDDCup 2007), and Yahoo music (Dror et al. 2012). We randomly partition the datasets into two groups: 80% ratings for training and the remaining 20% ratings for testing.
Dataset Splits	No	The paper mentions a "80% ratings for training and the remaining 20% ratings for testing" split but does not specify a validation set or validation split percentages/counts.
Hardware Specification	Yes	All experiments are conducted in the same machine with Intel Xeon E5-2690 3.0GHz CPU and 128GB RAM.
Software Dependencies	No	The paper mentions the use of "Mat Lab command" for generating singular values in the ill-conditioned case, but it does not specify the version number of Matlab or any other software dependencies with their versions.
Experiment Setup	Yes	For fairness, all the six solvers are initialized by X(0) = 0. And their regularizer parameters λ are set to identical values. In our simulation, we set both m and n to 5000, and rank r to 50. The oversampling ratio c is ﬁxed as 3. In the noiseless case, both curves can indicate the rate of convergence. From Figure 2(a) one can see that HADMCG and HADMNT converge superlinearly to the optimum and are signiﬁcantly faster than other baselines. In the noiseless case, we set λ = 10 10 for all solvers. We set the SNR to 0.01. The regularization parameter λ is set to 0.04 (m log m)/p, where p is the number of sampling elements. We impose exponential decay in singular values (the singular values is generated with speciﬁed Condition Number (CN) by Mat Lab command 1000 logspace( log(CN), 0, 50) where CN is set to 106). Moreover we perturb the observed entries by a Gaussian noise with SNR = 0.01. The regularizer parameter is set to the same value as in the noisy case. We randomly partition the datasets into two groups: 80% ratings for training and the remaining 20% ratings for testing. We repeat the experiments 5 times and report the average testing RMSE and CPU time. In our experiments, the regularizer parameters λ of the six NNLS solvers (including our hybrid solvers) are set to identical values. We set λ = 20 for both ML10M and Net Flix, and 200 for Yahoo Music. The six NNLS solvers and the non-convex method RP are initialized by 0. The rank parameters r of ﬁx rank methods, (namely LMaﬁt, LRGeom CG and R3MC) are set to the rank estimated by our hybrid methods. Since 0 is not a valid initializer for the ﬁx rank methods, LMaﬁt,LRGeom CG and R3MC are initialized by top-r SVD of the training matrix. We terminate these methods once a pre-speciﬁed training RMSE is achieved or they iterate more than 500 times.