Toward Optimal Solution for the Context-Attentive Bandit Problem

Authors: Djallel Bouneffouf, Raphael Feraud, Sohini Upadhyay, Irina Rish, Yasaman Khazaeni

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.5 Experiments We compare the proposed CATS algorithm with: (1) Random-EI: in addition to the V observed features, this algorithm selects a Random subset of features of the specified size U at each Iteration (thus, Random-EI), and then invokes the linear Thompson sampling algorithm. (2) Random-fix: this algorithm invokes linear Thompson sampling on a subset of U + V features, where the subset V is randomly selected once prior to seeing any data samples, and remains fixed. (3) The state-of-art method for context-attentive bandits proposed in [Bouneffouf et al., 2017], Thompson Sampling with Restricted Context (TSRC): TSRC solves the CBRC (contextual bandit with restricted context) problem discussed earlier: at each iteration, the algorithm decides on U +V features to observe (referred to as unobserved context). In our setting, however, V out of N features are immediately observed at each iteration (referred to as known context), then TSRC decision mechanism is used to select U additional unknown features to observe, followed by linear Thompson sampling on U+V features. (4) CALINUCB: where we replace the contextual TS in CATS with LINUCB. (5) CATS-fix: is heuristic where we stop the features exploration after some iterations T = 10%, 20%....90% (we report here the an average over the best results). Empirical evaluation of Random-fix, Random-EI, CATSfix, CALINUCB, CATS 1 and TSRC was performed on several publicly available datasets, as well as on a proprietary corporate dialog orchestration dataset.
Researcher Affiliation Collaboration Djallel Bouneffouf1 , Raphael Feraud2 , Sohini Upadhyay1 , Irina Rish3 and Yasaman Khazaeni1 1IBM Thomas J. Watson Research Center, Yorktown Heights, NY USA 2 Orange Labs 3Universite de montreal 1{Djallel.Bouneffouf@, Sohini.upadhyay@,Yasaman.khazaeni@}ibm.com, 2Raphael.Feraud@orange.com, 3irina.rish@mila.quebec
Pseudocode Yes Algorithm 1 Context Attentive Bandit Problem (CAB) and Algorithm 2 Context Attentive Thompson Sampling (CATS)
Open Source Code No No statement explicitly providing concrete access to source code for the methodology described in this paper was found.
Open Datasets Yes Publicly available Covertype and CNAE-9 were featured in the original TSRC paper and Warfarin [Sharabiani et al., 2015] is a historically popular dataset for evaluating bandit methods.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers). It mentions 'TS {0.25, 0.5, 1} and LINUCB {0.51, 1, 2}' which refer to exploration parameters, not software dependencies with versions.
Experiment Setup Yes Algorithm 2 Context Attentive Thompson Sampling (CATS) 1: Require: N, V , U, K, T, CV , α > 0, λ(t) and Note that we have used the same exploration parameter value used in [Chapelle and Li, 2011] for TS and LINUCB type algorithms which are TS {0.25, 0.5, 1} and LINUCB {0.51, 1, 2}. To consider the possibility of nonstationarity in the unknown context space over time, we introduce a weight decay parameter λ(t) that reduces the effect of past examples when updating the CATS parameters. We refer to the stationary case as CATS and fix λ(t) = 1. For the nonstationary setting, we simulate nonstationarity in the unknown feature space by duplicating each dataset, randomly fixing the known context in the same manner as above, and shuffling the unknown feature set label pairs. Then we stochastically replace events in the original dataset with their shuffled counterparts, with the probability of replacement increasing uniformly with each additional event.