Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Authors: Sanae Amani, Christos Thrampoulidis

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present numerical simulations to complement and confirm our theoretical findings. We evaluate the performance of MNL-UCB on synthetic data.
Researcher Affiliation Academia Sanae Amani University of California, Los Angeles EMAIL Christos Thrampoulidis University of British Columbia EMAIL
Pseudocode Yes Algorithm 1: MNL-UCB 1 for t = 1, . . . , T do 2 Compute t as in (21). 3 Compute xt := arg maxx2D T z(x, t) + t(x) with t(x) de๏ฌned in (22). 4 Play xt and observe yt.
Open Source Code No No explicit statement about providing open-source code or a link to a code repository was found.
Open Datasets No We evaluate the performance of MNL-UCB on synthetic data.
Dataset Splits No No specific dataset split information (percentages, sample counts, or predefined splits) was provided. The paper mentions using 'synthetic data' and '20 realizations' but no explicit train/validation/test splits.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided.
Software Dependencies No No specific ancillary software details with version numbers were provided.
Experiment Setup Yes In all experiments, we used the upper bound on in (26) to compute the exploration bonus t(x). We evaluate the performance of MNL-UCB on synthetic data. All the results shown depict averages over 20 realizations, for which we have chosen ฮด = 0.01, d = 2, and T = 1000. We considered time-independent decision sets D of 20 arms in R2 and the reward vector = [1, . . . , K]T . Moreover, the arms and i are drawn from N(0, Id) and N(0, Id/K), respectively.