Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multinomial Logit Contextual Bandits: Provable Optimality and Practicality
Authors: Min-hwan Oh, Garud Iyengar9205-9213
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performances of our proposed algorithms: UCB-MNL (Algorithm 1) and DBL-MNL (Algorithm 2) in numerical experiments. In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances. |
| Researcher Affiliation | Academia | Min-hwan Oh1 and Garud Iyengar2 1Seoul National University, Seoul, South Korea 2Columbia University, New York, USA |
| Pseudocode | Yes | Algorithm 1 UCB-MNL; Algorithm 2 DBL-MNL |
| Open Source Code | No | The paper does not provide a direct link to source code or an explicit statement about its release. |
| Open Datasets | No | In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits. It describes generating synthetic data and evaluating algorithms over a time horizon T. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In our evaluations, we report the cumulative regret for each round t {1, ..., T}. For each experimental configuration, we evaluate the algorithms on 20 independent instances and report average performances. In each instance, the underlying parameter θ is sampled from the d-dimensional uniform distribution, with each element of θ uniformly distributed in [0, 1]. The underlying parameters are fixed during each problem instance but not known to the algorithms. For efficient evaluations, we consider uniform revenues, i.e., rti = 1 for all i and t. We compare the performances of the proposed algorithms with those of the state-of-the-art Thompson sampling based algorithms, TS-MNL and optimistic TS-MNL, proposed in Oh and Iyengar (2019). Additionally, we evaluate the performance of the provably optimal but impractical algorithm, sup CB-MNL (see Algorithm 5 in the appendix), that is based on the Auer-framework. Figure 1 shows that the performances of UCB-MNL and DBL-MNL are superior to or comparable to the state-of-the-art Thompson sampling methods. Moreover, the runtime evaluation shows that DBL-MNL is significantly faster than the other methods due to its logarithmic number of parameter updates. |