UCB-based Algorithms for Multinomial Logistic Regression Bandits

Authors: Sanae Amani, Christos Thrampoulidis

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present numerical simulations to complement and confirm our theoretical findings. We evaluate the performance of MNL-UCB on synthetic data.
Researcher Affiliation Academia Sanae Amani University of California, Los Angeles samani@ucla.edu Christos Thrampoulidis University of British Columbia cthrampo@ece.ubc.ca
Pseudocode Yes Algorithm 1: MNL-UCB 1 for t = 1, . . . , T do 2 Compute t as in (21). 3 Compute xt := arg maxx2D T z(x, t) + t(x) with t(x) defined in (22). 4 Play xt and observe yt.
Open Source Code No No explicit statement about providing open-source code or a link to a code repository was found.
Open Datasets No We evaluate the performance of MNL-UCB on synthetic data.
Dataset Splits No No specific dataset split information (percentages, sample counts, or predefined splits) was provided. The paper mentions using 'synthetic data' and '20 realizations' but no explicit train/validation/test splits.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided.
Software Dependencies No No specific ancillary software details with version numbers were provided.
Experiment Setup Yes In all experiments, we used the upper bound on in (26) to compute the exploration bonus t(x). We evaluate the performance of MNL-UCB on synthetic data. All the results shown depict averages over 20 realizations, for which we have chosen δ = 0.01, d = 2, and T = 1000. We considered time-independent decision sets D of 20 arms in R2 and the reward vector = [1, . . . , K]T . Moreover, the arms and i are drawn from N(0, Id) and N(0, Id/K), respectively.