DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits

Authors: Mridul Agarwal, Vaneet Aggarwal, Abhishek Kumar Umrawal, Chris Quinn6557-6565

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also empirically evaluate the proposed algorithm DART, comparing it to other, state-of-the-art full-bandit feedback CMAB algorithms. We first consider a linear setting, where the joint reward as simply the mean of individual arm rewards. We also examine the setting where the joint reward is a quadratic function of individual arm rewards, based on the problem of cross-selling item selection (Raymond Chi-Wing Wong, Ada Wai-Chee Fu, and Wang 2003). Our algorithm significantly outperforms existing state of the art algorithms, while only using polynomial space and time complexity.
Researcher Affiliation Academia Mridul Agarwal1, Vaneet Aggarwal 1, Abhishek Kumar Umrawal1, Chris Quinn 2, 1 Purdue University 2 Iowa State University
Pseudocode Yes Algorithm 1 DART(T, N, K)
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets No The paper mentions generating data for simulations: 'individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1])' but does not refer to a publicly available, named dataset with access information.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, as it primarily uses simulated data within a bandit problem context where data is generated on the fly rather than split from a fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes We used N = 45 and T = 106 for simulations. We chose K = 2, 4, 8 for easy construction of Hadamard matrices for CSAR algorithm. We compare for two different reward setups. First setup has joint reward as linear function of individual rewards. Second setup has joint reward as a quadratic function of individual rewards. In each setup, individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1]). We run 25 independent iterations to plot average regret and the maximum and minimum values of the regret of each algorithm.