Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Authors: Min-hwan Oh, Garud Iyengar9205-9213

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performances of our proposed algorithms: UCB-MNL (Algorithm 1) and DBL-MNL (Algorithm 2) in numerical experiments. In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances.
Researcher Affiliation Academia Min-hwan Oh1 and Garud Iyengar2 1Seoul National University, Seoul, South Korea 2Columbia University, New York, USA
Pseudocode Yes Algorithm 1 UCB-MNL; Algorithm 2 DBL-MNL
Open Source Code No The paper does not provide a direct link to source code or an explicit statement about its release.
Open Datasets No In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t.
Dataset Splits No The paper does not provide specific train/validation/test dataset splits. It describes generating synthetic data and evaluating algorithms over a time horizon T.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances. In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t. We compare the performances of the proposed algorithms with those of the state-of-the-art Thompson sampling based algorithms, TS-MNL and optimistic TS-MNL, proposed in Oh and Iyengar (2019). Additionally, we evaluate the performance of the provably optimal but impractical algorithm, sup CB-MNL (see Algorithm 5 in the appendix), that is based on the Auer-framework. Figure 1 shows that the performances of UCB-MNL and DBL-MNL are superior to or comparable to the state-of-the-art Thompson sampling methods. Moreover, the runtime evaluation shows that DBL-MNL is significantly faster than the other methods due to its logarithmic number of parameter updates.