Multinomial Logit Contextual Bandits: Provable Optimality and Practicality
Authors: Min-hwan Oh, Garud Iyengar9205-9213
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performances of our proposed algorithms: UCB-MNL (Algorithm 1) and DBL-MNL (Algorithm 2) in numerical experiments. In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances. |
| Researcher Affiliation | Academia | Min-hwan Oh1 and Garud Iyengar2 1Seoul National University, Seoul, South Korea 2Columbia University, New York, USA |
| Pseudocode | Yes | Algorithm 1 UCB-MNL; Algorithm 2 DBL-MNL |
| Open Source Code | No | The paper does not provide a direct link to source code or an explicit statement about its release. |
| Open Datasets | No | In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits. It describes generating synthetic data and evaluating algorithms over a time horizon T. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances. In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t. We compare the performances of the proposed algorithms with those of the state-of-the-art Thompson sampling based algorithms, TS-MNL and optimistic TS-MNL, proposed in Oh and Iyengar (2019). Additionally, we evaluate the performance of the provably optimal but impractical algorithm, sup CB-MNL (see Algorithm 5 in the appendix), that is based on the Auer-framework. Figure 1 shows that the performances of UCB-MNL and DBL-MNL are superior to or comparable to the state-of-the-art Thompson sampling methods. Moreover, the runtime evaluation shows that DBL-MNL is significantly faster than the other methods due to its logarithmic number of parameter updates. |