Robust Contextual Bandits via Bootstrapping

Authors: Qiao Tang, Hong Xie, Yunni Xia, Jia Lee, Qingsheng Zhu12182-12189

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that the Boot Lin UCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance. We conduct extensive experiments to validate its superior performance of our Boot Lin UCB algorithm over the latest bootstrapping based Lin UCB algorithm (Hao et al. 2019) and the classical Lin UCB algorithm (Chu et al. 2011).
Researcher Affiliation Academia Chongqing Key Laboratory of Software Theory and Technology, Chongqing University
Pseudocode Yes Algorithm 1 Boot Lin UCB algorithmic framework. Algorithm 2 Boot Quantile(xa, Ht 1, 1 2αt)
Open Source Code No The paper mentions 'details refer to our code' but does not provide a specific link, repository, or explicit statement of open-source availability for its methodology.
Open Datasets No The paper uses synthetic data that is generated by the authors, stating 'We generate the feature vectors of A arms as follows: (1) generate min{d, A} orthogonal feature vectors with unit square norm (details refer to our code); (2) each of the remaining A min{d, A} feature vectors is drawn from [0, 1]d uniformly at random. The preference parameter θ is drawn from [0, 1]d uniformly at random.' However, it does not provide concrete access information (link, DOI, formal citation) to this generated dataset.
Dataset Splits No The paper describes an online learning framework over 'T = 2000 decision rounds' and uses synthetic data, but it does not specify explicit training, validation, or test dataset splits in the traditional sense for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No The paper does not provide specific software dependencies, such as programming language versions or library versions, that were used for the experiments.
Experiment Setup Yes Consider T = 2000 decision rounds. ... We set αt = 1/ t + 2, δ = 1/(t + 2) for our Boot Lin UCB, and set αt = 1/ t + 2 for the Lin UCB algorithm. ... Unless we state explicitly, we consider the following default parameters: A = 20 arms, features with d = 10 dimension, regularization parameter λ = 1, reward variance σ = 1. ... We use Monte Carlo simulation to estimate the quantile estimator bqt(x, bϵ, α) with 1000 simulation rounds.