Gaussian Process Bandits for Top-k Recommendations
Authors: Mohit Yadav, Cameron Musco, Daniel R. Sheldon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.This section empirically evaluates the proposed GP-Top K bandit algorithms for the top-k recommendations using a simulation based on the Movie Lens dataset [4]. |
| Researcher Affiliation | Academia | Mohit Yadav University of Massachusetts Amherst ymohit@cs.umass.edu Daniel Sheldon University of Massachusetts Amherst sheldon@cs.umass.edu Cameron Musco University of Massachusetts Amherst cmusco@cs.umass.edu |
| Pseudocode | Yes | Algorithm 1 Contextual Bandit Algorithm for Top-k Recommendations. Algorithm 2 Computing Weighted Convolutional Kendall Kernel. Algorithm 3 Computing Convolutional Kendall Kernel [10]. |
| Open Source Code | No | Our code can be accessed using this hyper-link. (The hyperlink itself is not present in the PDF text, thus not providing concrete access.) |
| Open Datasets | Yes | using a simulation based on the Movie Lens dataset [4]. [4] F Maxwell Harper and Joseph A Konstan. The Movielens datasets: History and context. In Transactions on Interactive Intelligent Systems, volume 5, pages 1 19. ACM, 2015. |
| Dataset Splits | Yes | We consider a 1M variant of the Movie Lens dataset, which contains 1 million ratings from 6040 users for 3677 items. Both context and item embeddings, i.e., cu and θi, are 5-dimensional, optimized by considering the 5-fold performance on this dataset. |
| Hardware Specification | Yes | We utilized multiple NVIDIA Tesla M40 GPUs with 40 GB RAM on our in-house cluster for our experiments. |
| Software Dependencies | No | The paper describes its experimental setup and methods but does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For setting up the reward functions, we utilize a similarity function s(c, θ) := ς(a (c T θ) b) to measure similarity between any user and item embeddings, where a and b are similarity score and shift scalars, respectively. We set a and b to 6 and 0.3, respectively, to fully utilize the range of the similarity function, as assessed by evaluating its value for many arms. We set λ = 0.75 to emphasize relevance over diversity. For the ϵ-greedy baselines, we considered various values of ϵ are considered, specifically ϵ = {0.01, 0.05, 0.1}. For MAB-UCB baseline, ...βmab values within the set {0.1, 0.25, 0.5}... For the parameters of proposed GP-Top K bandit algorithms, we set βt = βgp log(|X| t2 π2) with βgp {0.05, 0.1, 0.5}. The selection of σ for all variants is determined by optimizing the log-likelihood of the observed after every 10 rounds by considering values in the set {0.01, 0.05, 0.1}. We use 10 restarts and 5 steps in each search direction for the local search, starting with 1000 initial candidates. |