Online Clustering of Bandits with Misspecified User Models
Authors: Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, John C.S. Lui
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real-world data show our outperformance over previous algorithms. This section compares RCLUMB and RSCLUMB with CLUB [12], SCLUB [27], Lin UCB with a single estimated vector for all users, Lin UCB-Ind with separate estimated vectors for each user, and two modifications of Lin UCB in [23] which we name as RLin UCB and RLin UCB-Ind. We use averaged reward as the evaluation metric, where the average is taken over ten independent trials. |
| Researcher Affiliation | Academia | Zhiyong Wang The Chinese University of Hong Kong zywang21@cse.cuhk.edu.hk Jize Xie Shanghai Jiao Tong University xjzzjl@sjtu.edu.cn Xutong Liu The Chinese University of Hong Kong liuxt@cse.cuhk.edu.hk Shuai Li Shanghai Jiao Tong University shuaili8@sjtu.edu.cn John C.S. Lui The Chinese University of Hong Kong cslui@cse.cuhk.edu.hk |
| Pseudocode | Yes | Algorithm 1 Robust Clustering of Misspecified Bandits Algorithm (RCLUMB) |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code publicly available or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on the Yelp data and the 20m Movie Lens data [17]. |
| Dataset Splits | No | The paper describes data generation and experimental setup for synthetic and real-world datasets, but it does not specify explicit training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory used for running experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries). |
| Experiment Setup | Yes | Input: Deletion parameter α1, α2 > 0, f(T) = q 1+T , λ, β, ϵ > 0. We consider a setting with u = 1,000 users, m = 10 clusters and T = 10^6 rounds. The preference and feature vectors are in d = 50 dimension with each entry drawn from a standard Gaussian distribution, and are normalized to vectors with . 2 = 1 [27]. We fix an arm set with |A| = 1000 items, at each round t, 20 items are randomly selected to form a set At for the user to choose from. We construct a matrix ϵ R^1,000x1,000 in which each element ϵ(i, j) is drawn uniformly from the range (-0.2, 0.2) to represent the deviation. |