Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Smooth neighborhood recommender systems

Authors: Ben Dai, Junhui Wang, Xiaotong Shen, Annie Qu

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data Last.fm music data. ... Section 5 examines the numerical performance of the proposed method in simulation studies and a real application to the Last.fm dataset (http://www.last.fm).
Researcher Affiliation Academia Ben Dai EMAIL Junhui Wang EMAIL School of Data Science City University of Hong Kong Kowloon Tong, 999077, Hong Kong Xiaotong Shen EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA Annie Qu EMAIL Department of Statistics University of Illinois at Urbana Champaign Champaign, IL 61820, USA
Pseudocode Yes The computational strategy of ALS is to break large-scale optimization into multiple small subproblems by alternatively fixing either pu or qi, where each subproblem is a simple penalized least squares regression and can be solved analytically with J( ) = 2 2. Note that this strategy is applicable as long as J( ) is separable for pu and qi. For illustration, consider J( ) = 2 2. At iteration k, b Q(k) is fixed and the latent factor pu is updated as ˆp(k+1) u = argminpu P i P (u ,i ) Ωωui,u i (ru i p T u ˆq(k) i )2 +λ1 pu 2 2. Similarly, with fixed b P (k+1), qi is updated as ˆq(k+1) i = argminqi P u P (u ,i ) Ωωui,u i (ru i (ˆp(k+1) u )T qi)2 + λ2 qi 2 2. Then each subproblem is solved analytically, ˆp(k+1) u = X i ˆq(k) i (ˆq(k) i )T + λ1IK 1 X i rΩ ui ˆq(k) i , (6) ˆq(k+1) i = X u ˆp(k+1) u (ˆp(k+1) u )T + λ2IK 1 X u rΩ ui ˆp(k+1) u , (7)
Open Source Code No The paper mentions that code for *competitor* methods (RBM, CRBM, Soft Impute, SSR, g SVD) is publicly available or provided by their authors. However, it does not provide concrete access to source code for the proposed method (s SVD).
Open Datasets Yes Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data Last.fm music data. ... Section 5 examines the numerical performance of the proposed method in simulation studies and a real application to the Last.fm dataset (http://www.last.fm). ... In this section, we analyze a online music dataset from the Last.fm (http://www.last.fm), which was released in the second International Workshop Het Rec 2011 (http://ir.ii.uam.es/hetrec2011).
Dataset Splits Yes The remaining ratings are randomly split into training, tuning and testing sets with 60%, 15%, and 25% of the observations, respectively. ... For evaluation, we apply 5-fold cross-validation over a random partition of the original dataset, and calculate the RMSE as in Koyejo and Ghosh (2011).
Hardware Specification Yes In our implementation, the algorithm is coded through Py MP, which is a Python version of Open MP in C, and can handle a dataset with a size up to the order of 108 on a quad-core computer with one 3.40GHz CPU and 8G memory.
Software Dependencies No The paper mentions that the algorithm is coded through Py MP, which is a Python version of Open MP in C, but it does not specify version numbers for Py MP, Python, or C, or any other key libraries/solvers.
Experiment Setup Yes For tuning parameter selection, we set the learning rate, the momentum rate and the number of hidden units for RBM and CRBM as 0.005, 0.9, and 100, respectively. For r SVD, g SVD and s SVD, we set the tuning parameter K to be the true one, and the optimal λ is determined by a grid search over {10(ν 31)/10; ν = 1, , 61}. For the proposed s SVD, a Gaussian kernel is used with the window size h being the median distance among all user-item pairs. ... For SVD-based methods, we set K = 5, and select the optimal λ from {1, , 25} through the 5-fold cross-validation. For CRBM, the parameters are set as suggested in Nguyen and Lauw (2016); and for SSR, the parameters are determined through cross-validation as suggested in Zhao et al. (2016).