Multi-armed Bandits with Compensation

Authors: Siwei Wang, Longbo Huang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present experimental results to demonstrate the performance of the algorithms.
Researcher Affiliation Academia Siwei Wang IIIS, Tsinghua University wangsw15@mails.tsinghua.edu.cn; Longbo Huang IIIS, Tsinghua University longbohuang@tsinghua.edu.cn
Pseudocode Yes Algorithm 1: The UCB algorithm for KCMAB. Algorithm 2: The modified "-greedy algorithm for KCMAB. Algorithm 3: The Modified Thompson Sampling Algorithm for KCMAB. Algorithm 4: Procedure Update
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes a simulated environment with a specific reward vector and number of time steps, but does not refer to a publicly available dataset with concrete access information (link, DOI, citation).
Dataset Splits No The paper describes a simulated game run for 10000 time steps and averaged over 1000 runs, but does not specify train, validation, or test dataset splits in the conventional sense of data partitioning.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, that would be needed to replicate the experiment.
Experiment Setup Yes In our experiments, there are a total of nine arms with expected reward vector µ = [0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]. We run the game for T = 10000 time steps. [...] In our experiment, we choose = 20. [...] Here we choose to be 10,15 and 20.