CUR Algorithm for Partially Observed Matrices

Authors: Miao Xu, Rong Jin, Zhi-Hua Zhou

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on both synthetic and real-world datasets verify our theoretical claims and demonstrate the effectiveness of the proposed algorithm.
Researcher Affiliation Collaboration Miao Xu XUM@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China Rong Jin RONGJIN@CSE.MSU.EDU Institute of Data Science and Technologies at Alibaba Group, Seattle, USA Zhi-Hua Zhou ZHOUZH@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
Pseudocode No The paper describes the CUR+ algorithm using prose and mathematical formulations but does not provide a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. It only mentions future plans: 'in the future, we plan to combine the proposed algorithm with adaptive sampling strategy to further reduce the error bound. We also plan to exploit the recent studies on matrix approximation/completion with non-uniform sampling and extend the CUR algorithm to the case when observed entries are non-uniform sampled.'
Open Datasets Yes We evaluate the performance of the proposed CUR+ algorithm on several benchmark data sets that have been used in the recent studies of the CUR matrix decomposition algorithm, including Enron emails (39, 861 28, 102), Dexter (20, 000 2, 600), Farm Ads (54, 877 4, 143) and Gisette (13, 500 5, 000)... Detailed information of these data sets can be found in (Wang & Zhang, 2013).
Dataset Splits No The paper describes how partial observations are created by randomly sampling entries from the matrix for the approximation task, but it does not specify explicit training, validation, and test splits (e.g., percentages or distinct sets) for model development or evaluation in a traditional supervised learning sense. The evaluation measures performance against the full matrix, not a dedicated validation set.
Hardware Specification Yes all the experiments were run on a Linux server with CPU 2.53GHz and 48GB memory.
Software Dependencies No The paper states, 'We implement the proposed algorithm using Matlab,' but it does not specify any version numbers for Matlab or other software dependencies.
Experiment Setup Yes To make our result comparable to the previous studies, we adapted the same experiment strategy as in (Wang & Zhang, 2012; 2013). More specially, for each data set, we set d1 = αr and d2 = αd1, with rank r varied in the range of {10, 20, 50} and α is set to be 5. To create partial observations, we randomly sample |Ω| = Ω0 = nmr2/nnz(M) entries from the target matrix M, where nnz(M) is the number of non-zero entries of M. We measure the performance of low rank matrix approximation by the related spectral-norm difference ℓs = M c M / M Mr which has solid theoretical guarantee according to Theorem 3. To make a fair comparison with previous work measured by Frobenius norm, we also report the results measured by relative Frobenius norm, that is ℓF = M c M F / M Mr F . Finally, we follow the experimental protocol specified in (Wang & Zhang, 2012) by repeating every experiment 10 times and reporting the mean value.