Combinatorial Bandits for Maximum Value Reward Function under Value-Index Feedback
Authors: Yiliu Wang, Wei Chen, Milan Vojnovic
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our algorithm through experimental results. ... Our numerical results are presented in Section 5. ... We perform experiments to evaluate the performance of Algorithm 3.1 on specific problem instances. We compare our algorithm with three baseline methods: the well-known UCB algorithm treating each set of size k as one arm, the standard CUCB algorithm for semi-bandit CMAB (Chen et al., 2013), and DART (Agarwal et al., 2021b) for full-bandit CMAB. ... Results Figure 1 presents the regrets of Algorithm 3.1 and three baseline methods across the three scenarios. |
| Researcher Affiliation | Collaboration | Yiliu Wang Allen Institute Seattle, WA 98109, USA yiliu.wang@alleninstitute.org Wei Chen Microsoft Research Beijing, China weic@microsoft.com Milan Vojnovi c Department of Statistics, LSE London, United Kingdom m.vojnovic@lse.ac.uk |
| Pseudocode | Yes | Algorithm 3.1 CUCB algorithm for unknown ordering of values ... Algorithm 4.1 CUCB algorithm for arbitrary distributions with finite supports |
| Open Source Code | Yes | The code we use is available at: https://github.com/Sketch-EXP/kmax. |
| Open Datasets | No | The paper defines three custom distributions (D1, D2, D3) for its experiments in Appendix F but does not state that these or any other datasets are publicly available or provide a link/citation for them. It describes the setup of arm distributions (e.g., v = (0.1, 0.2, ..., 0.9), p is such that pi = 0.3). |
| Dataset Splits | No | The paper does not specify explicit training/validation/test splits. It mentions running experiments for a time horizon T=5000 and repeating them 20 times to average cumulative regrets. However, it does not describe how the data itself (arm outcomes) is split for training, validation, or testing purposes in a way that would allow for reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. It only mentions using 'the well-known UCB algorithm', 'standard CUCB algorithm', and 'DART' without specifying their software implementations or versions. |
| Experiment Setup | Yes | We consider settings with n = 9 arms and sets of cardinality k = 3. ... We run each experiment for a horizon time of T = 5000. In each round, we select arms according to the offline oracle and sample their values for updates. We compare the reward to that of the optimal set S*. We repeat the experiments 20 times and average the cumulative regrets. ... We tested on three different arm distributions such that a single arm changes from one scenario to another. We expect the algorithm to perform well for all three cases. For space reasons, we show the detailed setup in Appendix F. |