UCB-based Algorithms for Multinomial Logistic Regression Bandits
Authors: Sanae Amani, Christos Thrampoulidis
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical simulations to complement and confirm our theoretical findings. We evaluate the performance of MNL-UCB on synthetic data. |
| Researcher Affiliation | Academia | Sanae Amani University of California, Los Angeles samani@ucla.edu Christos Thrampoulidis University of British Columbia cthrampo@ece.ubc.ca |
| Pseudocode | Yes | Algorithm 1: MNL-UCB 1 for t = 1, . . . , T do 2 Compute t as in (21). 3 Compute xt := arg maxx2D T z(x, t) + t(x) with t(x) defined in (22). 4 Play xt and observe yt. |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository was found. |
| Open Datasets | No | We evaluate the performance of MNL-UCB on synthetic data. |
| Dataset Splits | No | No specific dataset split information (percentages, sample counts, or predefined splits) was provided. The paper mentions using 'synthetic data' and '20 realizations' but no explicit train/validation/test splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided. |
| Software Dependencies | No | No specific ancillary software details with version numbers were provided. |
| Experiment Setup | Yes | In all experiments, we used the upper bound on in (26) to compute the exploration bonus t(x). We evaluate the performance of MNL-UCB on synthetic data. All the results shown depict averages over 20 realizations, for which we have chosen δ = 0.01, d = 2, and T = 1000. We considered time-independent decision sets D of 20 arms in R2 and the reward vector = [1, . . . , K]T . Moreover, the arms and i are drawn from N(0, Id) and N(0, Id/K), respectively. |