Mixed-Effects Contextual Bandits
Authors: Kyungbok Lee, Myunghee Cho Paik, Min-hwan Oh, Gi-Soo Kim
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide numerical experiments demonstrating the advantage of our proposed algorithm, supporting the theoretical claims. Numerical Experiments Simulation Data We compare the cumulative regret of the following algorithms: (i) C2UCB (Qin, Chen, and Zhu 2014) (ii) the proposed ME-CUCB1 with true D and σ2 (iii) the proposed ME-CUCB2 with estimated D and σ2. |
| Researcher Affiliation | Collaboration | Kyungbok Lee1, Myunghee Cho Paik1, 2, Min-hwan Oh3, Gi-Soo Kim 4 * 1 Department of Statistics, Seoul National University 2 Shepherd23 Inc. 3 Graduate School of Data Science, Seoul National University 4 Department of Industrial Engineering, Ulsan National Institute of Science and Technology |
| Pseudocode | Yes | Algorithm 1: Mixed-Effects Contextual UCB1 (ME-CUCB1) Algorithm 2: Mixed-Effects Contextual UCB2 (ME-CUCB2) |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The Movie Lens 10M dataset (Harper and Konstan 2015) contains 10 million triplets of users, movies, and the ratings from 0 to 5 across 10,681 movies. |
| Dataset Splits | Yes | We split the dataset into train/test sets by 8:2. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions several algorithms and methods (e.g., C2UCB, Lin UCB, EM algorithm, PMF) but does not specify any software names with version numbers that would be necessary to replicate the experiment. |
| Experiment Setup | Yes | For C2UCB we restrict the super-arms to K arms containing m context vectors. All three have α as a hyperparameter to control the exploration rate. We run the experiments with α {10 3, 2 10 3, 10 2, 10 1, 1, 10} and choose the value with smallest average cumulative regret. We run c = 10 random exploration rounds. We run c = 20 exploration rounds and use EM algorithm for ME-CUCB2 to estimate b Dt and bσ2 t . |