Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Armed Bandits with Interference: Bridging Causal Inference and Adversarial Bandits
Authors: Su Jia, Peter I. Frazier, Nathan Kallus
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We visualize the results in Figures 4 and 5. Consistent with the theoretical analysis, CR outperforms SB in terms of 95% percentile regret, without sacrificing mean regret. Moreover, we observe that when N = T 3, the regret exhibits a smaller deviation. Finally, we observe that when N = T 3, the 95 regret percentile of CR is lower compared to the N = T 2 case. This is reasonable since a larger N helps reduce the variance in the reward estimation. |
| Researcher Affiliation | Academia | 1Center for Data Science for Enterprise and Society, Cornell University 2School of Operations Research and Information Engineering, Cornell University. |
| Pseudocode | Yes | Algorithm 1 EXP3-HT-IX Policy 1: Input: η (0, 1): learning rate, β [0, 1/2): IX parameter for the HT-IX estimator, (ℓ, r): parameters for the RRP. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | No | We consider a 2-armed setting with N units lying on a N lattice. We generate unit-level interference as follows. Each unit u [N] is assigned a random reward Rut. |
| Dataset Splits | No | The paper describes generating synthetic data and running experiments on 'instances' but does not provide specific training/test/validation dataset splits for a pre-existing dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions comparing 'EXP3-IX' and 'EXP3-IX-HT' policies, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | We consider a 2-armed setting with N units lying on a N lattice. We generate unit-level interference as follows. Each unit u [N] is assigned a random reward Rut. Let ρut be the proportion of the five immediate neighbors (counting u itself) assigned arm 1 at time t. Then, the reward at u is (2ρut 1)cut. In two sets of experiments, we assume that N = T 2 and N = T 3 respectively, and let T range from 10, 20, . . . , 50. For each fixed N, T, we randomly generate 100 instances. To necessitate exploration, we add large-scale non-stationarity by randomly generating drifts. Each drift is an 8-piecewise constant function, where the value on each piece is independently drawn from U(0, 1). To align with the theoretical analysis, we partition the lattice into square-shaped clusters of side length N 1/4. For simplicity, we perform a simplified version of the clustering without randomly assigning the boundary units to nearby clusters. We compare the switchback version of EXP3-IX (denoted SB) and our clustered randomization-based policy, dubbed EXP3-IX-HT, and denoted CR in the figures. We evaluated the performance of the two policies by running each of them 200 times for each instance. |