Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimization and Analysis of the pAp@k Metric for Recommender Systems
Authors: Gaurush Hiranandani, Warut Vijitbenjaronk, Sanmi Koyejo, Prateek Jain
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis and experimental evaluation suggest that p Ap@k indeed exhibits a certain dual behavior with respect to partial AUC and precision@k. Moreover, the proposed methods outperform all the baselines in various applications. Taken together, our results motivate the use of p Ap@k for large-scale recommender systems with heterogeneous user-engagement. In this section, we present evaluation of our methods on sim ulated and real data. |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana-Champaign, Illinois, USA 2Microsoft Research, Bengaluru, Karnataka, India. |
| Pseudocode | Yes | Algorithm 1 GD-p Ap@k-surr and Algorithm 2 Subgradient calculation for p Ap@k surrogates |
| Open Source Code | Yes | Source code: https://github.com/gaurush-hiranandani/pap-k |
| Open Datasets | Yes | Movie Recommendation (70K instances, 15.5K positives, 638 users, d = 90, k = 8 24): We use the Movielens 100K dataset (Harper & Konstan, 2015)... Citation Recommendation (142K instances, 21K positives, 2477 users, d = 157, k = 6 18): The task in the citation dataset (Budhiraja et al., 2020)... Image Recommendation (670K instances, 111K positives, 2498 users, d = 150, k = 5 25): We take the Behance dataset (He et al., 2016)... |
| Dataset Splits | No | For all the methods, including baselines, the learning rate and regularization parameters are cross validated on the set {10 4 , 2 10 4 , 5 10 4 , 10 3 , . . . , 0.5} and 10{ 3,...,1}, respectively. While cross-validation implies a splitting strategy, the paper does not provide specific details on the splits (e.g., number of folds, percentages for train/validation sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions) that would be needed for replication. |
| Experiment Setup | Yes | For all the methods, including baselines, the learning rate and regularization parameters are cross validated on the set {10 4 , 2 10 4 , 5 10 4 , 10 3 , . . . , 0.5} and 10{ 3,...,1}, respectively. We fx ηt = η/ t + 1 in our methods and use a regularized version of the surrogates by adding λkwk2 . |