Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination
Authors: Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines. and Finally, we conduct numerical simulations to demonstrate the efficacy of the proposed algorithms. |
| Researcher Affiliation | Collaboration | Arpan Mukherjee Rensselaer Polytechnic Institute mukhea5@rpi.edu Ali Tajer Rensselaer Polytechnic Institute tajera@rpi.edu Pin-Yu Chen IBM Research Pin-Yu.Chen@ibm.com Payel Das IBM Research daspa@us.ibm.com |
| Pseudocode | Yes | Algorithm 1 Gap-based algorithm for CBAI (G-CBAI) and Algorithm 2 Successive elimination-based algorithm for CBAI (SE-CBAI) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We use two real-world datasets, namely the New Yorker Caption Contest (NYCC) dataset and the PKIS2 dataset for comparing our algorithms to the existing ones. and The NYCC dataset has been obtained from the UCI Machine Learning repository [29] and PKIS2 dataset from [28]. |
| Dataset Splits | No | The paper discusses sample complexity and identifying the best arm in a bandit setting but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets in the typical sense of model evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We consider a Gaussian bandit instance with K = 4, and the true mean vector is µ = [2.5, 2.3, 2, 0.6]. and the attack probability ε is set to ε = 0.1. and The parameter choices for the algorithms, such as T(α, δ) and βi(t, δ), are derived and described within the paper's theoretical sections. |