Online Clustering of Bandits with Misspecified User Models

Authors: Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, John C.S. Lui

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real-world data show our outperformance over previous algorithms. This section compares RCLUMB and RSCLUMB with CLUB [12], SCLUB [27], Lin UCB with a single estimated vector for all users, Lin UCB-Ind with separate estimated vectors for each user, and two modifications of Lin UCB in [23] which we name as RLin UCB and RLin UCB-Ind. We use averaged reward as the evaluation metric, where the average is taken over ten independent trials.
Researcher Affiliation Academia Zhiyong Wang The Chinese University of Hong Kong zywang21@cse.cuhk.edu.hk Jize Xie Shanghai Jiao Tong University xjzzjl@sjtu.edu.cn Xutong Liu The Chinese University of Hong Kong liuxt@cse.cuhk.edu.hk Shuai Li Shanghai Jiao Tong University shuaili8@sjtu.edu.cn John C.S. Lui The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk
Pseudocode Yes Algorithm 1 Robust Clustering of Misspecified Bandits Algorithm (RCLUMB)
Open Source Code No The paper does not contain any explicit statement about making the source code publicly available or a link to a code repository.
Open Datasets Yes We conduct experiments on the Yelp data and the 20m Movie Lens data [17].
Dataset Splits No The paper describes data generation and experimental setup for synthetic and real-world datasets, but it does not specify explicit training, validation, or test dataset splits or percentages.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for running experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries).
Experiment Setup Yes Input: Deletion parameter α1, α2 > 0, f(T) = q 1+T , λ, β, ϵ > 0. We consider a setting with u = 1,000 users, m = 10 clusters and T = 10^6 rounds. The preference and feature vectors are in d = 50 dimension with each entry drawn from a standard Gaussian distribution, and are normalized to vectors with . 2 = 1 [27]. We fix an arm set with |A| = 1000 items, at each round t, 20 items are randomly selected to form a set At for the user to choose from. We construct a matrix ϵ R^1,000x1,000 in which each element ϵ(i, j) is drawn uniformly from the range (-0.2, 0.2) to represent the deviation.