DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits
Authors: Mridul Agarwal, Vaneet Aggarwal, Abhishek Kumar Umrawal, Chris Quinn6557-6565
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also empirically evaluate the proposed algorithm DART, comparing it to other, state-of-the-art full-bandit feedback CMAB algorithms. We first consider a linear setting, where the joint reward as simply the mean of individual arm rewards. We also examine the setting where the joint reward is a quadratic function of individual arm rewards, based on the problem of cross-selling item selection (Raymond Chi-Wing Wong, Ada Wai-Chee Fu, and Wang 2003). Our algorithm significantly outperforms existing state of the art algorithms, while only using polynomial space and time complexity. |
| Researcher Affiliation | Academia | Mridul Agarwal1, Vaneet Aggarwal 1, Abhishek Kumar Umrawal1, Chris Quinn 2, 1 Purdue University 2 Iowa State University |
| Pseudocode | Yes | Algorithm 1 DART(T, N, K) |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | No | The paper mentions generating data for simulations: 'individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1])' but does not refer to a publicly available, named dataset with access information. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, as it primarily uses simulated data within a bandit problem context where data is generated on the fly rather than split from a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We used N = 45 and T = 106 for simulations. We chose K = 2, 4, 8 for easy construction of Hadamard matrices for CSAR algorithm. We compare for two different reward setups. First setup has joint reward as linear function of individual rewards. Second setup has joint reward as a quadratic function of individual rewards. In each setup, individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1]). We run 25 independent iterations to plot average regret and the maximum and minimum values of the regret of each algorithm. |