CO-BED: Information-Theoretic Contextual Optimization via Bayesian Experimental Design

Authors: Desi R. Ivanova, Joel Jennings, Tom Rainforth, Cheng Zhang, Adam Foster

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate its effectiveness in a number of experiments, where CO-BED demonstrates competitive performance even when compared to bespoke, model-specific alternatives. We demonstrate the benefits of CO-BED in a series of experiments. Even when compared against bespoke, model-specific alternatives, we find it consistently performs on par or better, highlighting its effectiveness as a highly applicable and efficient solution. We further find it is able to scale gracefully, with effective performance maintained on a problem with a 5000 dimensional design space. Our results showcase the promising potential of CO-BED as an off-the-shelf tool for contextual optimization in various settings.
Researcher Affiliation Collaboration Desi R. Ivanova 1 Joel Jennings 2 Tom Rainforth 1 Cheng Zhang 2 Adam Foster 2 Work partially conducted during an internship at Microsoft Research Cambridge. 1Department of Statistics, University of Oxford 2Microsoft Research. Correspondence to: Desi R Ivanova <desi.ivanova@stats.ox.ac.uk>, Adam Foster <adam.e.foster@microsoft.com>.
Pseudocode Yes Algorithm 1 CO-BED
Open Source Code Yes Code is available at https://github.com/microsoft/co-bed.
Open Datasets No The paper mentions using "observational data" and various models like "Gaussian Process (GP)" or "Contextual Linear Bandits" that imply data, but it does not provide concrete access information (link, DOI, specific repository name, or formal citation with authors/year) for any publicly available dataset used for training. For example, in Section 5.2, it states "We created 100 initial observational data points; this data was then held fixed across all experiment runs and seeds." but no public access is given.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or absolute counts) that would be needed for direct reproduction, nor does it reference predefined splits with citations for specific datasets in the main text. It mentions "evaluation contexts" but not in the context of standard dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies Yes We implement all experiments in Pyro (Bingham et al., 2018), which is a probabilistic programming framework on top of Py Torch (Paszke et al., 2019). Our code will be open-sourced upon publication. We used the Adam optimiser (Kingma & Ba, 2014) with an initial learning rate of 0.001 and exponential learning rate annealing with a coefficient of 0.96 applied every 1000 steps.
Experiment Setup Yes Training details All experiment baselines ran for 50K gradient steps, using a batch size of 2048. We used the Adam optimiser (Kingma & Ba, 2014) with an initial learning rate of 0.001 and exponential learning rate annealing with a coefficient of 0.96 applied every 1000 steps. We used a separable critic architecture (Poole et al., 2019) with simple MLP encoders with Re LU activations and 32 output units. For the discrete treatment example: we added batch norm to the critic architecture, which helped to stabilise the optimisation. We had one hidden layer of size 512. Additionally, for the Gumbel Softmax policy, we started with a temperature τ = 2.0 and hard=False constraint. We applied temperature annealing every 10K steps with a factor of 0.5. We switch to hard=True in the last 10K steps of training. For the continuous treatment example: We used MLPs with hidden layers of sizes [design dimension 2; 412; 256] and 32 output units.