reproducibilityindex.ai

DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits

Authors: Mridul Agarwal, Vaneet Aggarwal, Abhishek Kumar Umrawal, Chris Quinn6557-6565

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also empirically evaluate the proposed algorithm DART, comparing it to other, state-of-the-art full-bandit feedback CMAB algorithms. We ﬁrst consider a linear setting, where the joint reward as simply the mean of individual arm rewards. We also examine the setting where the joint reward is a quadratic function of individual arm rewards, based on the problem of cross-selling item selection (Raymond Chi-Wing Wong, Ada Wai-Chee Fu, and Wang 2003). Our algorithm signiﬁcantly outperforms existing state of the art algorithms, while only using polynomial space and time complexity.
Researcher Affiliation	Academia	Mridul Agarwal1, Vaneet Aggarwal 1, Abhishek Kumar Umrawal1, Chris Quinn 2, 1 Purdue University 2 Iowa State University
Pseudocode	Yes	Algorithm 1 DART(T, N, K)
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets	No	The paper mentions generating data for simulations: 'individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1])' but does not refer to a publicly available, named dataset with access information.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, as it primarily uses simulated data within a bandit problem context where data is generated on the fly rather than split from a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We used N = 45 and T = 106 for simulations. We chose K = 2, 4, 8 for easy construction of Hadamard matrices for CSAR algorithm. We compare for two different reward setups. First setup has joint reward as linear function of individual rewards. Second setup has joint reward as a quadratic function of individual rewards. In each setup, individual arm rewards follows Bernoulli distribution with means sampled from U([0, 1]). We run 25 independent iterations to plot average regret and the maximum and minimum values of the regret of each algorithm.