Bandit Learning with Delayed Impact of Actions
Authors: Wei Tang, Chien-Ju Ho, Yang Liu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an algorithm that achieves a regret of O(KT 2/3) and show a matching regret lower bound of (KT 2/3), where K is the number of arms and T is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms. Finally, we conduct a series of simulations showing that our algorithms compare favorably to other state-of-the-art methods proposed in other application domains. |
| Researcher Affiliation | Academia | Wei Tang , Chien-Ju Ho , and Yang Liu Washington University in St. Louis, University of California, Santa Cruz {w.tang, chienju.ho}@wustl.edu, yangliu@ucsc.edu |
| Pseudocode | Yes | Algorithm 1 Action-Dependent UCB; Algorithm 2 Reduction Template; Algorithm 3 History-Dependent UCB |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of its source code. |
| Open Datasets | No | The paper refers to 'simulations' but does not specify any publicly available datasets used, nor does it provide any concrete access information for data. |
| Dataset Splits | No | The paper mentions 'simulations' but does not provide specific dataset split information (e.g., percentages, sample counts, or cross-validation details) needed to reproduce data partitioning. |
| Hardware Specification | No | The paper mentions 'simulations' but does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | No | The paper states 'The detailed setups and discussion are in Appendix I' but Appendix I is not provided. The main text does not contain specific experimental setup details such as hyperparameter values or training configurations. |