reproducibilityindex.ai

Bandit Learning with Delayed Impact of Actions

Authors: Wei Tang, Chien-Ju Ho, Yang Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose an algorithm that achieves a regret of O(KT 2/3) and show a matching regret lower bound of (KT 2/3), where K is the number of arms and T is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms. Finally, we conduct a series of simulations showing that our algorithms compare favorably to other state-of-the-art methods proposed in other application domains.
Researcher Affiliation	Academia	Wei Tang , Chien-Ju Ho , and Yang Liu Washington University in St. Louis, University of California, Santa Cruz {w.tang, chienju.ho}@wustl.edu, yangliu@ucsc.edu
Pseudocode	Yes	Algorithm 1 Action-Dependent UCB; Algorithm 2 Reduction Template; Algorithm 3 History-Dependent UCB
Open Source Code	No	The paper does not provide any specific links or explicit statements about the availability of its source code.
Open Datasets	No	The paper refers to 'simulations' but does not specify any publicly available datasets used, nor does it provide any concrete access information for data.
Dataset Splits	No	The paper mentions 'simulations' but does not provide specific dataset split information (e.g., percentages, sample counts, or cross-validation details) needed to reproduce data partitioning.
Hardware Specification	No	The paper mentions 'simulations' but does not provide specific hardware details such as exact GPU/CPU models or processor types used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper states 'The detailed setups and discussion are in Appendix I' but Appendix I is not provided. The main text does not contain specific experimental setup details such as hyperparameter values or training configurations.