reproducibilityindex.ai

Combinatorial Bandits for Maximum Value Reward Function under Value-Index Feedback

Authors: Yiliu Wang, Wei Chen, Milan Vojnovic

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our algorithm through experimental results. ... Our numerical results are presented in Section 5. ... We perform experiments to evaluate the performance of Algorithm 3.1 on specific problem instances. We compare our algorithm with three baseline methods: the well-known UCB algorithm treating each set of size k as one arm, the standard CUCB algorithm for semi-bandit CMAB (Chen et al., 2013), and DART (Agarwal et al., 2021b) for full-bandit CMAB. ... Results Figure 1 presents the regrets of Algorithm 3.1 and three baseline methods across the three scenarios.
Researcher Affiliation	Collaboration	Yiliu Wang Allen Institute Seattle, WA 98109, USA yiliu.wang@alleninstitute.org Wei Chen Microsoft Research Beijing, China weic@microsoft.com Milan Vojnovi c Department of Statistics, LSE London, United Kingdom m.vojnovic@lse.ac.uk
Pseudocode	Yes	Algorithm 3.1 CUCB algorithm for unknown ordering of values ... Algorithm 4.1 CUCB algorithm for arbitrary distributions with finite supports
Open Source Code	Yes	The code we use is available at: https://github.com/Sketch-EXP/kmax.
Open Datasets	No	The paper defines three custom distributions (D1, D2, D3) for its experiments in Appendix F but does not state that these or any other datasets are publicly available or provide a link/citation for them. It describes the setup of arm distributions (e.g., v = (0.1, 0.2, ..., 0.9), p is such that pi = 0.3).
Dataset Splits	No	The paper does not specify explicit training/validation/test splits. It mentions running experiments for a time horizon T=5000 and repeating them 20 times to average cumulative regrets. However, it does not describe how the data itself (arm outcomes) is split for training, validation, or testing purposes in a way that would allow for reproduction of data partitioning.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies. It only mentions using 'the well-known UCB algorithm', 'standard CUCB algorithm', and 'DART' without specifying their software implementations or versions.
Experiment Setup	Yes	We consider settings with n = 9 arms and sets of cardinality k = 3. ... We run each experiment for a horizon time of T = 5000. In each round, we select arms according to the offline oracle and sample their values for updates. We compare the reward to that of the optimal set S*. We repeat the experiments 20 times and average the cumulative regrets. ... We tested on three different arm distributions such that a single arm changes from one scenario to another. We expect the algorithm to perform well for all three cases. For space reasons, we show the detailed setup in Appendix F.