reproducibilityindex.ai

Adapting to misspecification in contextual bandits with offline regression oracles

Authors: Sanath Kumar Krishnamurthy, Vitor Hadad, Susan Athey

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1 shows the behavior of average regret for a realizability-based algorithm FALCON+ (Simchi-Levi & Xu, 2020), under the (incorrect) assumption that the underlying model is linear. The details of this algorithm are not particularly important. It sufﬁces to understand that at the end of approximately every 2m rounds (we call these intervals epochs and index them by m), the algorithm computed an estimate ˆf(x, a) of (1), assuming a linear model and based on data from the previous epoch. Then, for every round in this epoch, it selects arms based on a probabilistic model where arms with high ˆf(x, ) have higher probability. A full description of this example is given in the appendix. [...] To complete our discussion from Section 1, we simulate a version of linear Safe-FALCON on Example (1). In particular, we implement a version of Safe-FALCON that uses two misspeciﬁcation tests, a test that checks if the cumulative reward remains above a lower bound (line 3 of Check-is-safe) and a test that checks if the average per-epoch reward remains above a lower bound (24). Other parameters are chosen as in the introduction example (see Appendix D for details). The results are shown in Figure 2.
Researcher Affiliation	Academia	1Management Science and Engineering, Stanford University, Stanford, CA, USA 2Graduate School of Business, Stanford University, Stanford, CA, USA. Correspondence to: Sanath Kumar Krishnamurthy <sanathsk@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Safe-FALCON input: Initial epoch length τ1 2, conﬁdence parameter δ (0, 1). [...] Algorithm 2 Check-is-safe input: Epoch m, time-step t, lower bound lm 1, and Crwdt. [...] Algorithm 3 Choose-safe input: Epoch m, lower bound lm 1, and data collected in the m-th epoch Sm.
Open Source Code	No	The paper does not provide any links to source code or explicit statements about its public availability.
Open Datasets	No	The paper uses a synthetic example defined as "E[rt xt = x, at = a] = ( I{xt > 0.5} if a = 1 0.5 if a = 2 (1) where xt Uniform[0, 1] represents the contexts observed at the beginning of the t-th round, at is the action taken, rt is the reward, which is observed by the experimenter with noise et N(0, 1)." This is a described simulation setup, not a publicly available dataset with a concrete access method (link, DOI, etc.).
Dataset Splits	No	The paper describes a simulation setup and mentions
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Other parameters are chosen as in the introduction example (see Appendix D for details). The full description of this example is given in the appendix.