reproducibilityindex.ai

Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning

Authors: Chao Chen, Dongsheng Li, Junchi Yan, Hanchi Huang, Xiaokang Yang7011-7019

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment This section studies the performance of the proposed SGMC algorithm in top-N recommendation tasks. We demonstrate that SGMC can not only make scalable and accurate recommendations on large datasets, but also is able to explain the recommendations in microand macro-level. All experiments are conducted on a server with an Intel Xeon(R) CPU E5-2678 2.50GHz CPU and 128G RAM. Experimental Setup Datasets We use three large-scale real-world datasets (1) Movie Lens 20M data (ML20M) (Harper and Konstan 2015) (2 107 ratings of 138,493 users and 26,611 items); (2) Netﬂix prize data (Netﬂix) (Bennett, Lanning et al. 2007) (108 ratings of 480,189 users and 17,770 items); and (3) Million Song data (MSD) (Bertin-Mahieux et al. 2011) (5 107 ratings of 1,019,318 users and 384,546 items). Note that the data here is binarized, and for each dataset we split it into train and test sets randomly with the ratio of 9:1. We adopt the ranking metrics of normalized discounted cumulative gain (NDCG) and F1 score, and all reported results are averaged over ﬁve different random train-test splits.
Researcher Affiliation	Collaboration	Chao Chen1, Dongsheng Li3,4, Junchi Yan1,2 , Hanchi Huang1, Xiaokang Yang1 1 Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2 Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 3 School of Computer Science, Fudan University, Shanghai, China 4 Microsoft Research Asia, Shanghai, China
Pseudocode	No	The paper describes the steps of the SGMC method and its approximations (Step 1, Step 2, Step 3) in paragraph form, but it does not provide a formal pseudocode block or an explicitly labeled 'Algorithm' section.
Open Source Code	No	The paper mentions implementing the SGMC algorithm using Apache Spark and rewriting parts of a baseline (ˆB(sparse)) in Spark. However, it does not provide any specific link or explicit statement about making their own implementation code publicly available.
Open Datasets	Yes	We use three large-scale real-world datasets (1) Movie Lens 20M data (ML20M) (Harper and Konstan 2015) (2 107 ratings of 138,493 users and 26,611 items); (2) Netﬂix prize data (Netﬂix) (Bennett, Lanning et al. 2007) (108 ratings of 480,189 users and 17,770 items); and (3) Million Song data (MSD) (Bertin-Mahieux et al. 2011) (5 107 ratings of 1,019,318 users and 384,546 items).
Dataset Splits	No	The paper states: 'for each dataset we split it into train and test sets randomly with the ratio of 9:1.' It specifies a train and test split but does not explicitly mention a validation set or details for one.
Hardware Specification	Yes	All experiments are conducted on a server with an Intel Xeon(R) CPU E5-2678 2.50GHz CPU and 128G RAM.
Software Dependencies	No	The paper states: 'We implement our SGMC algorithm and its extensions using Apache Spark, where the matrix multiplications are parallelized.' While Apache Spark is mentioned, no specific version number is provided, nor are other software dependencies with their versions.
Experiment Setup	Yes	In the following experiments, we use the factor size K = 1000 for all SGMC models and the elastic-net parameters α = 0.007 and γ = 0.01 in Eq. (14). Meanwhile, we sample L = 6000 columns of Ł for approximate SGMC.