Causal Contextual Bandits with Targeted Interventions

Authors: Chandrasekar Subramanian, Balaraman Ravindran

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new algorithm, which we show empirically performs better than baselines on experiments that use purely synthetic data and on real world-inspired experiments. We also prove a bound on regret that theoretically guards performance.
Researcher Affiliation Collaboration Chandrasekar Subramanian1, 2, Balaraman Ravindran1, 2 1 Robert Bosch Centre for Data Science and Artificial Intelligence 2 Department of Computer Science and Engineering Indian Institute of Technology Madras, Chennai, India sekarnet@gmail.com, ravi@cse.iitm.ac.in
Pseudocode Yes Algorithm 1a: Training phase of Unc CCB. Algorithm 1b: Evaluation phase of Unc CCB.
Open Source Code Yes The source code of all the experiments in the main paper, along with a README on how to run them, is provided as part of the Supplementary Materials in a file named Supplemental Code Paper1203.zip.
Open Datasets Yes the dataset used to calibrate the real worldinspired experiments (see Section 4.2 of main paper) is made available in anonymized form at https://doi.org/10.5281/zenodo.5540348 under the CC BY-NC 4.0 license.
Dataset Splits No The paper describes training and evaluation phases but does not explicitly define or refer to standard train/validation/test splits, nor does it mention a dedicated validation set for model tuning or hyperparameter selection.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general experimental settings.
Software Dependencies No The paper states that the source code is provided in supplementary materials but does not explicitly list the software dependencies or their specific version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.) within the text.
Experiment Setup Yes We choose the fraction of Phase 1 rounds α = 0.5 as we empirically found that to be a good choice. Further, we also do a warm-start for all algorithms to account for the fact that, in practice, there is often some starting beliefs about the CPDs from past data, domain knowledge, etc., which can be encoded into the agent s prior beliefs. Specifically, we ran pure random exploration for 15 rounds at the beginning of each algorithm and updated all CPD beliefs; this simulates a warm-start. Due to this, we used α = 0 for our algorithm. For the specific parameterizations of all settings used in all the experiments, refer to the README file in Supplemental Code Paper1203.zip as part of the Supplemental Material.