Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Safe Collaborative Filtering
Authors: Riku Togashi, Tatsushi Oka, Naoto Ohsaka, Tetsuro Morimura
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on real-world datasets demonstrates the excellent tail performance of our approach while maintaining competitive computational efficiency. |
| Researcher Affiliation | Collaboration | Riku Togashia, Tatsushi Okab Naoto Ohsakaa Tetsuro Morimuraa a Cyber Agent EMAIL*, EMAIL b Department of Economics, Keio University EMAIL |
| Pseudocode | Yes | Algorithm 1: SAFER2 solver. |
| Open Source Code | Yes | Our source code is publicly available at https://github.com/riktor/safer2-recommender. |
| Open Datasets | Yes | We experiment with two Movie Lens datasets (ML-1M and ML-20M) (Harper and Konstan, 2015) and Million Song Dataset (MSD) (Bertin-Mahieux et al., 2011). |
| Dataset Splits | Yes | We then consider 80% of users for training (i.e., {Vi}i U). The remaining 10% of users in two holdout splits are used for validation and testing. |
| Hardware Specification | Yes | The reported numbers are the averaged runtime through 50 epochs measured using 86.4 GB RAM and Intel(R) Xeon(R) CPU @ 2.00GHz with 96 CPU cores. We implemented Mult-VAE using Py Torch and utilized an NVIDIA P100 GPU to speed up its training. |
| Software Dependencies | No | The paper mentions software like 'Eigen' and 'Py Torch' but does not specify their version numbers, which are required for reproducible software dependencies. |
| Experiment Setup | Yes | In all models, we initialize U and V with Gaussian noise with standard deviation σ/d where σ = 0.1 in all datasets (Rendle et al., 2022) and tune β0 and λ. We set α = 0.3 in Eq. (5) and Eq. (7). For SAFER2, we also search the bandwidth h and set the number of NR iterations as L = 5. The dimensionality d of user/item embeddings is set to 32, 256, and 512 for ML-1M, ML-20M, and MSD, respectively. |