Adapting to misspecification in contextual bandits with offline regression oracles
Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1 shows the behavior of average regret for a realizability-based algorithm FALCON+ (Simchi-Levi & Xu, 2020), under the (incorrect) assumption that the underlying model is linear. The details of this algorithm are not particularly important. It suffices to understand that at the end of approximately every 2m rounds (we call these intervals epochs and index them by m), the algorithm computed an estimate ˆf(x, a) of (1), assuming a linear model and based on data from the previous epoch. Then, for every round in this epoch, it selects arms based on a probabilistic model where arms with high ˆf(x, ) have higher probability. A full description of this example is given in the appendix. [...] To complete our discussion from Section 1, we simulate a version of linear Safe-FALCON on Example (1). In particular, we implement a version of Safe-FALCON that uses two misspecification tests, a test that checks if the cumulative reward remains above a lower bound (line 3 of Check-is-safe) and a test that checks if the average per-epoch reward remains above a lower bound (24). Other parameters are chosen as in the introduction example (see Appendix D for details). The results are shown in Figure 2. |
| Researcher Affiliation | Academia | 1Management Science and Engineering, Stanford University, Stanford, CA, USA 2Graduate School of Business, Stanford University, Stanford, CA, USA. Correspondence to: Sanath Kumar Krishnamurthy <sanathsk@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Safe-FALCON input: Initial epoch length τ1 2, confidence parameter δ (0, 1). [...] Algorithm 2 Check-is-safe input: Epoch m, time-step t, lower bound lm 1, and Crwdt. [...] Algorithm 3 Choose-safe input: Epoch m, lower bound lm 1, and data collected in the m-th epoch Sm. |
| Open Source Code | No | The paper does not provide any links to source code or explicit statements about its public availability. |
| Open Datasets | No | The paper uses a synthetic example defined as "E[rt xt = x, at = a] = ( I{xt > 0.5} if a = 1 0.5 if a = 2 (1) where xt Uniform[0, 1] represents the contexts observed at the beginning of the t-th round, at is the action taken, rt is the reward, which is observed by the experimenter with noise et N(0, 1)." This is a described simulation setup, not a publicly available dataset with a concrete access method (link, DOI, etc.). |
| Dataset Splits | No | The paper describes a simulation setup and mentions |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Other parameters are chosen as in the introduction example (see Appendix D for details). The full description of this example is given in the appendix. |