Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gaussian Process Bandits for Top-k Recommendations
Authors: Mohit Yadav, Cameron Musco, Daniel R. Sheldon
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.This section empirically evaluates the proposed GP-Top K bandit algorithms for the top-k recommendations using a simulation based on the Movie Lens dataset [4]. |
| Researcher Affiliation | Academia | Mohit Yadav University of Massachusetts Amherst EMAIL Daniel Sheldon University of Massachusetts Amherst EMAIL Cameron Musco University of Massachusetts Amherst EMAIL |
| Pseudocode | Yes | Algorithm 1 Contextual Bandit Algorithm for Top-k Recommendations. Algorithm 2 Computing Weighted Convolutional Kendall Kernel. Algorithm 3 Computing Convolutional Kendall Kernel [10]. |
| Open Source Code | No | Our code can be accessed using this hyper-link. (The hyperlink itself is not present in the PDF text, thus not providing concrete access.) |
| Open Datasets | Yes | using a simulation based on the Movie Lens dataset [4]. [4] F Maxwell Harper and Joseph A Konstan. The Movielens datasets: History and context. In Transactions on Interactive Intelligent Systems, volume 5, pages 1 19. ACM, 2015. |
| Dataset Splits | Yes | We consider a 1M variant of the Movie Lens dataset, which contains 1 million ratings from 6040 users for 3677 items. Both context and item embeddings, i.e., cu and θi, are 5-dimensional, optimized by considering the 5-fold performance on this dataset. |
| Hardware Specification | Yes | We utilized multiple NVIDIA Tesla M40 GPUs with 40 GB RAM on our in-house cluster for our experiments. |
| Software Dependencies | No | The paper describes its experimental setup and methods but does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For setting up the reward functions, we utilize a similarity function s(c, θ) := ς(a (c T θ) b) to measure similarity between any user and item embeddings, where a and b are similarity score and shift scalars, respectively. We set a and b to 6 and 0.3, respectively, to fully utilize the range of the similarity function, as assessed by evaluating its value for many arms. We set λ = 0.75 to emphasize relevance over diversity. For the ϵ-greedy baselines, we considered various values of ϵ are considered, specifically ϵ = {0.01, 0.05, 0.1}. For MAB-UCB baseline, ...βmab values within the set {0.1, 0.25, 0.5}... For the parameters of proposed GP-Top K bandit algorithms, we set βt = βgp log(|X| t2 π2) with βgp {0.05, 0.1, 0.5}. The selection of σ for all variants is determined by optimizing the log-likelihood of the observed after every 10 rounds by considering values in the set {0.01, 0.05, 0.1}. We use 10 restarts and 5 steps in each search direction for the local search, starting with 1000 initial candidates. |