Mixed-Effects Contextual Bandits

Authors: Kyungbok Lee, Myunghee Cho Paik, Min-hwan Oh, Gi-Soo Kim

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide numerical experiments demonstrating the advantage of our proposed algorithm, supporting the theoretical claims. Numerical Experiments Simulation Data We compare the cumulative regret of the following algorithms: (i) C2UCB (Qin, Chen, and Zhu 2014) (ii) the proposed ME-CUCB1 with true D and σ2 (iii) the proposed ME-CUCB2 with estimated D and σ2.
Researcher Affiliation Collaboration Kyungbok Lee1, Myunghee Cho Paik1, 2, Min-hwan Oh3, Gi-Soo Kim 4 * 1 Department of Statistics, Seoul National University 2 Shepherd23 Inc. 3 Graduate School of Data Science, Seoul National University 4 Department of Industrial Engineering, Ulsan National Institute of Science and Technology
Pseudocode Yes Algorithm 1: Mixed-Effects Contextual UCB1 (ME-CUCB1) Algorithm 2: Mixed-Effects Contextual UCB2 (ME-CUCB2)
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The Movie Lens 10M dataset (Harper and Konstan 2015) contains 10 million triplets of users, movies, and the ratings from 0 to 5 across 10,681 movies.
Dataset Splits Yes We split the dataset into train/test sets by 8:2.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running the experiments are provided in the paper.
Software Dependencies No The paper mentions several algorithms and methods (e.g., C2UCB, Lin UCB, EM algorithm, PMF) but does not specify any software names with version numbers that would be necessary to replicate the experiment.
Experiment Setup Yes For C2UCB we restrict the super-arms to K arms containing m context vectors. All three have α as a hyperparameter to control the exploration rate. We run the experiments with α {10 3, 2 10 3, 10 2, 10 1, 1, 10} and choose the value with smallest average cumulative regret. We run c = 10 random exploration rounds. We run c = 20 exploration rounds and use EM algorithm for ME-CUCB2 to estimate b Dt and bσ2 t .