Gaussian Process Bandits for Top-k Recommendations

Authors: Mohit Yadav, Cameron Musco, Daniel R. Sheldon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additionally, empirical results using a bandit simulator demonstrate that the proposed algorithm outperforms other baselines across various scenarios.This section empirically evaluates the proposed GP-Top K bandit algorithms for the top-k recommendations using a simulation based on the Movie Lens dataset [4].
Researcher Affiliation Academia Mohit Yadav University of Massachusetts Amherst ymohit@cs.umass.edu Daniel Sheldon University of Massachusetts Amherst sheldon@cs.umass.edu Cameron Musco University of Massachusetts Amherst cmusco@cs.umass.edu
Pseudocode Yes Algorithm 1 Contextual Bandit Algorithm for Top-k Recommendations. Algorithm 2 Computing Weighted Convolutional Kendall Kernel. Algorithm 3 Computing Convolutional Kendall Kernel [10].
Open Source Code No Our code can be accessed using this hyper-link. (The hyperlink itself is not present in the PDF text, thus not providing concrete access.)
Open Datasets Yes using a simulation based on the Movie Lens dataset [4]. [4] F Maxwell Harper and Joseph A Konstan. The Movielens datasets: History and context. In Transactions on Interactive Intelligent Systems, volume 5, pages 1 19. ACM, 2015.
Dataset Splits Yes We consider a 1M variant of the Movie Lens dataset, which contains 1 million ratings from 6040 users for 3677 items. Both context and item embeddings, i.e., cu and θi, are 5-dimensional, optimized by considering the 5-fold performance on this dataset.
Hardware Specification Yes We utilized multiple NVIDIA Tesla M40 GPUs with 40 GB RAM on our in-house cluster for our experiments.
Software Dependencies No The paper describes its experimental setup and methods but does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For setting up the reward functions, we utilize a similarity function s(c, θ) := ς(a (c T θ) b) to measure similarity between any user and item embeddings, where a and b are similarity score and shift scalars, respectively. We set a and b to 6 and 0.3, respectively, to fully utilize the range of the similarity function, as assessed by evaluating its value for many arms. We set λ = 0.75 to emphasize relevance over diversity. For the ϵ-greedy baselines, we considered various values of ϵ are considered, specifically ϵ = {0.01, 0.05, 0.1}. For MAB-UCB baseline, ...βmab values within the set {0.1, 0.25, 0.5}... For the parameters of proposed GP-Top K bandit algorithms, we set βt = βgp log(|X| t2 π2) with βgp {0.05, 0.1, 0.5}. The selection of σ for all variants is determined by optimizing the log-likelihood of the observed after every 10 rounds by considering values in the set {0.01, 0.05, 0.1}. We use 10 restarts and 5 steps in each search direction for the local search, starting with 1000 initial candidates.