Robust Bandit Learning with Imperfect Context

Authors: Jianyi Yang, Shaolei Ren10594-10602

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we apply Max Min UCB and Min WD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis. and In Fig. 2, we compare different algorithms in terms of three cumulative regret objectives: robust regret in Eqn. (7), worst-case regret in Eqn. (8) and true regret in Eqn. (2).
Researcher Affiliation Academia Jianyi Yang, Shaolei Ren University of California, Riverside {jyang239, shaolei}@ucr.edu
Pseudocode Yes Algorithm 1 Robust Arm Selection with Imperfect Context
Open Source Code No No explicit statement about providing open-source code for the methodology or links to code repositories are found.
Open Datasets No Finally, we apply Max Min UCB and Min WD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis. and Given a sequence of true contexts, imperfect context sequence is generated by sampling i.i.d. uniform distribution over B (xt) at each round. This indicates synthetic data, not a publicly available dataset with access information.
Dataset Splits No The paper describes synthetic simulations over Time Slots and an online learning scenario but does not provide specific train/validation/test dataset splits.
Hardware Specification No The paper describes running synthetic simulations but does not provide any specific hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions specific parameters for the Gaussian kernel and other settings but does not list any software dependencies with specific version numbers.
Experiment Setup Yes In the simulations, Gaussian kernel with parameter 0.1 is used for reward (loss) estimation. λ in Eqn. (3) is set as 0.1. The exploration rate is set as ht = 0.04.