Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ordinal Non-negative Matrix Factorization for Recommendation
Authors: Olivier Gouvert, Thomas Oberlin, Cédric Févotte
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report recommendation experiments on explicit and implicit datasets, and show that Ord NMF outperforms Be Po F and PF applied to binarized data. |
| Researcher Affiliation | Academia | 1IRIT, Université de Toulouse, CNRS, France 2ISAESUPAERO, Université de Toulouse, France. |
| Pseudocode | Yes | Algorithm 1 CAVI for IG-Ord NMF. |
| Open Source Code | Yes | All the Python codes are available on https://github.com/Oligou/OrdNMF. |
| Open Datasets | Yes | Movie Lens (Harper & Konstan, 2015). This dataset contains the ratings of users on movies on a scale from 1 to 10. These explicit feedbacks correspond to ordinal data. We consider that the class 0 corresponds to the absence of a rating for a couple user-movie. The histogram of the ordinal data is represented in blue on Figure 4. We pre-process a subset of the data as in (Liang et al., 2016), keeping only users and movies that have more than 20 interactions. We obtain U = 20k users and I = 12k movies. Taste Profile (Bertin-Mahieux et al., 2011). This dataset, provided by the Echo Nest, contains the play counts of users on a catalog of songs. As mentioned in the introduction, we choose to quantize these counts on a predefined scale in order to obtain ordinal data. |
| Dataset Splits | No | The paper specifies a train/test split ("the train set contains 80%... the test set contains the remaining 20%") but does not explicitly mention a separate validation set. |
| Hardware Specification | Yes | The computer used for these experiments was a Mac Book Pro with an Intel Core i5 processor (2,9 GHz) and 16 Go RAM. |
| Software Dependencies | No | The paper mentions that the code is in Python, but does not specify version numbers for Python itself or any specific libraries (e.g., PyTorch, TensorFlow, scikit-learn) that would be needed for replication. |
| Experiment Setup | Yes | For all models, we select the shape hyperparameters αW = αH = 0.3 among {0.1, 0.3, 1} (Gopalan et al., 2015). The number of latent factors is chosen among K {25, 50, 100, 150, 200, 350} for the best NDCG score with threshold s = 8 for the Movie Lens dataset, and s = 1 for the Taste Profile dataset. All the algorithms are run 5 times with random initializations and are stopped when the relative increment of the expected lower bound (ELBO) falls under τ = 10-5. |