Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Smooth neighborhood recommender systems
Authors: Ben Dai, Junhui Wang, Xiaotong Shen, Annie Qu
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data Last.fm music data. ... Section 5 examines the numerical performance of the proposed method in simulation studies and a real application to the Last.fm dataset (http://www.last.fm). |
| Researcher Affiliation | Academia | Ben Dai EMAIL Junhui Wang EMAIL School of Data Science City University of Hong Kong Kowloon Tong, 999077, Hong Kong Xiaotong Shen EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA Annie Qu EMAIL Department of Statistics University of Illinois at Urbana Champaign Champaign, IL 61820, USA |
| Pseudocode | Yes | The computational strategy of ALS is to break large-scale optimization into multiple small subproblems by alternatively fixing either pu or qi, where each subproblem is a simple penalized least squares regression and can be solved analytically with J( ) = 2 2. Note that this strategy is applicable as long as J( ) is separable for pu and qi. For illustration, consider J( ) = 2 2. At iteration k, b Q(k) is fixed and the latent factor pu is updated as ˆp(k+1) u = argminpu P i P (u ,i ) Ωωui,u i (ru i p T u ˆq(k) i )2 +λ1 pu 2 2. Similarly, with fixed b P (k+1), qi is updated as ˆq(k+1) i = argminqi P u P (u ,i ) Ωωui,u i (ru i (ˆp(k+1) u )T qi)2 + λ2 qi 2 2. Then each subproblem is solved analytically, ˆp(k+1) u = X i ˆq(k) i (ˆq(k) i )T + λ1IK 1 X i rΩ ui ˆq(k) i , (6) ˆq(k+1) i = X u ˆp(k+1) u (ˆp(k+1) u )T + λ2IK 1 X u rΩ ui ˆp(k+1) u , (7) |
| Open Source Code | No | The paper mentions that code for *competitor* methods (RBM, CRBM, Soft Impute, SSR, g SVD) is publicly available or provided by their authors. However, it does not provide concrete access to source code for the proposed method (s SVD). |
| Open Datasets | Yes | Finally, we illustrate that the proposed method improves substantially over its competitors in simulated examples and real benchmark data Last.fm music data. ... Section 5 examines the numerical performance of the proposed method in simulation studies and a real application to the Last.fm dataset (http://www.last.fm). ... In this section, we analyze a online music dataset from the Last.fm (http://www.last.fm), which was released in the second International Workshop Het Rec 2011 (http://ir.ii.uam.es/hetrec2011). |
| Dataset Splits | Yes | The remaining ratings are randomly split into training, tuning and testing sets with 60%, 15%, and 25% of the observations, respectively. ... For evaluation, we apply 5-fold cross-validation over a random partition of the original dataset, and calculate the RMSE as in Koyejo and Ghosh (2011). |
| Hardware Specification | Yes | In our implementation, the algorithm is coded through Py MP, which is a Python version of Open MP in C, and can handle a dataset with a size up to the order of 108 on a quad-core computer with one 3.40GHz CPU and 8G memory. |
| Software Dependencies | No | The paper mentions that the algorithm is coded through Py MP, which is a Python version of Open MP in C, but it does not specify version numbers for Py MP, Python, or C, or any other key libraries/solvers. |
| Experiment Setup | Yes | For tuning parameter selection, we set the learning rate, the momentum rate and the number of hidden units for RBM and CRBM as 0.005, 0.9, and 100, respectively. For r SVD, g SVD and s SVD, we set the tuning parameter K to be the true one, and the optimal λ is determined by a grid search over {10(ν 31)/10; ν = 1, , 61}. For the proposed s SVD, a Gaussian kernel is used with the window size h being the median distance among all user-item pairs. ... For SVD-based methods, we set K = 5, and select the optimal λ from {1, , 25} through the 5-fold cross-validation. For CRBM, the parameters are set as suggested in Nguyen and Lauw (2016); and for SSR, the parameters are determined through cross-validation as suggested in Zhao et al. (2016). |