reproducibilityindex.ai

Bandits with Unobserved Confounders: A Causal Approach

Authors: Elias Bareinboim, Andrew Forney, Judea Pearl

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empower Thompson Sampling with this new principle and run extensive simulations. The experiments suggest that the new strategy is stats. efﬁcient and consistent (Sec. 4). 4 Applications & Experiments In the next section, we provide simulations to support the efﬁcacy of TSC in the MABUC context. Experiment 1 employs the Greedy Casino parameterization... Each experiment compares the performance of traditional Thompson Sampling bandit players versus TSC. All reported simulations are partitioned into rounds of T = 1000 trials averaged over N = 1000 Monte Carlo repetitions.
Researcher Affiliation	Academia	Elias Bareinboim Department of Computer Science Purdue University eb@purdue.edu Andrew Forney Department of Computer Science University of California, Los Angeles forns@cs.ucla.edu Judea Pearl Department of Computer Science University of California, Los Angeles judea@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Causal Thompson Sampling (TSC) 1: procedure TSC(Pobs, T) 2: E(YX=a\|X) Pobs(y\|X) (seed distribution) 3: for t = [1, ..., T] do 4: x intuition(t) (get intuition for trial) 5: Q1 E(YX=x \|X = x) (estimated payout for counter-intuition) 6: Q2 P(y\|X = x) (estimated payout for intuition) 7: w [1, 1] (initialize weights) 8: bias 1 \|Q1 Q2\| (compute weighting strength) 9: if Q1 > Q2 then w[x] bias else w[x ] bias (choose arm to bias) 10: a max(β(s M1,x, f M1,x) w[1], β(s M2,x, f M2,x) w[2]) (choose arm) 6 11: y pull(a) (receive reward) 12: E(YX=a\|X = x) y\|a, x (update)
Open Source Code	Yes	(For simulation source code, see https://github.com/ucla-csl/mabuc )
Open Datasets	No	The paper describes two synthetic scenarios with payout rates provided in tables (Table 1 and Table 2), which define the data generation process for the simulations. It does not refer to a pre-existing, publicly available dataset with a specific link, DOI, or repository for direct download.
Dataset Splits	No	The paper conducts simulations in a sequential decision-making setting, where algorithms learn online. It describes the number of trials and Monte Carlo repetitions, but it does not specify traditional train/validation/test dataset splits as one would for a static dataset.
Hardware Specification	No	The paper describes simulations but does not provide specific hardware details (such as CPU/GPU models, memory, or specific computing environments) used to run these experiments.
Software Dependencies	No	The paper mentions algorithms like Thompson Sampling and implies the use of Python via the GitHub link, but it does not specify any software dependencies with version numbers (e.g., library names with specific versions).
Experiment Setup	Yes	All reported simulations are partitioned into rounds of T = 1000 trials averaged over N = 1000 Monte Carlo repetitions. At each time step in a single round, (1) values for the unobserved confounders (B, D) and observed intuitive arm choice (x) are selected by their respective structural equations (see Section 2), (2) the player observes the value of x, (3) the player chooses an arm based on their given strategy to maximize reward (which may or may not employ x), and ﬁnally, (4) the player receives a Bernoulli reward Y {0, 1} and records the outcome. Furthermore, at the start of every round, players possess knowledge of the problem’s observational distribution, i.e., each player begins knowing P(Y \|X) (see Table 2b). The notation: β(s Mk,x, f Mk,x) means to sample from a Beta distribution with parameters equal to the successes encountered choosing action x on machine Mk(s Mk,x) and the failures encountered choosing action x on that machine (f Mk,x).