Multi-armed Bandits with Compensation
Authors: Siwei Wang, Longbo Huang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present experimental results to demonstrate the performance of the algorithms. |
| Researcher Affiliation | Academia | Siwei Wang IIIS, Tsinghua University wangsw15@mails.tsinghua.edu.cn; Longbo Huang IIIS, Tsinghua University longbohuang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: The UCB algorithm for KCMAB. Algorithm 2: The modified "-greedy algorithm for KCMAB. Algorithm 3: The Modified Thompson Sampling Algorithm for KCMAB. Algorithm 4: Procedure Update |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes a simulated environment with a specific reward vector and number of time steps, but does not refer to a publicly available dataset with concrete access information (link, DOI, citation). |
| Dataset Splits | No | The paper describes a simulated game run for 10000 time steps and averaged over 1000 runs, but does not specify train, validation, or test dataset splits in the conventional sense of data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, that would be needed to replicate the experiment. |
| Experiment Setup | Yes | In our experiments, there are a total of nine arms with expected reward vector µ = [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]. We run the game for T = 10000 time steps. [...] In our experiment, we choose = 20. [...] Here we choose to be 10,15 and 20. |