Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning

Authors: Chao Chen, Dongsheng Li, Junchi Yan, Hanchi Huang, Xiaokang Yang7011-7019

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment This section studies the performance of the proposed SGMC algorithm in top-N recommendation tasks. We demonstrate that SGMC can not only make scalable and accurate recommendations on large datasets, but also is able to explain the recommendations in microand macro-level. All experiments are conducted on a server with an Intel Xeon(R) CPU E5-2678 2.50GHz CPU and 128G RAM. Experimental Setup Datasets We use three large-scale real-world datasets (1) Movie Lens 20M data (ML20M) (Harper and Konstan 2015) (2 107 ratings of 138,493 users and 26,611 items); (2) Netflix prize data (Netflix) (Bennett, Lanning et al. 2007) (108 ratings of 480,189 users and 17,770 items); and (3) Million Song data (MSD) (Bertin-Mahieux et al. 2011) (5 107 ratings of 1,019,318 users and 384,546 items). Note that the data here is binarized, and for each dataset we split it into train and test sets randomly with the ratio of 9:1. We adopt the ranking metrics of normalized discounted cumulative gain (NDCG) and F1 score, and all reported results are averaged over five different random train-test splits.
Researcher Affiliation Collaboration Chao Chen1, Dongsheng Li3,4, Junchi Yan1,2 , Hanchi Huang1, Xiaokang Yang1 1 Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 3 School of Computer Science, Fudan University, Shanghai, China 4 Microsoft Research Asia, Shanghai, China
Pseudocode No The paper describes the steps of the SGMC method and its approximations (Step 1, Step 2, Step 3) in paragraph form, but it does not provide a formal pseudocode block or an explicitly labeled 'Algorithm' section.
Open Source Code No The paper mentions implementing the SGMC algorithm using Apache Spark and rewriting parts of a baseline (ˆB(sparse)) in Spark. However, it does not provide any specific link or explicit statement about making their own implementation code publicly available.
Open Datasets Yes We use three large-scale real-world datasets (1) Movie Lens 20M data (ML20M) (Harper and Konstan 2015) (2 107 ratings of 138,493 users and 26,611 items); (2) Netflix prize data (Netflix) (Bennett, Lanning et al. 2007) (108 ratings of 480,189 users and 17,770 items); and (3) Million Song data (MSD) (Bertin-Mahieux et al. 2011) (5 107 ratings of 1,019,318 users and 384,546 items).
Dataset Splits No The paper states: 'for each dataset we split it into train and test sets randomly with the ratio of 9:1.' It specifies a train and test split but does not explicitly mention a validation set or details for one.
Hardware Specification Yes All experiments are conducted on a server with an Intel Xeon(R) CPU E5-2678 2.50GHz CPU and 128G RAM.
Software Dependencies No The paper states: 'We implement our SGMC algorithm and its extensions using Apache Spark, where the matrix multiplications are parallelized.' While Apache Spark is mentioned, no specific version number is provided, nor are other software dependencies with their versions.
Experiment Setup Yes In the following experiments, we use the factor size K = 1000 for all SGMC models and the elastic-net parameters α = 0.007 and γ = 0.01 in Eq. (14). Meanwhile, we sample L = 6000 columns of Ł for approximate SGMC.