Bayesian Regret Minimization in Offline Bandits

Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical results on synthetic domains confirm that our approach is superior to LCB.
Researcher Affiliation Collaboration 1University of New Hampshire 2Google Research 3Amazon AGI.
Pseudocode Yes Algorithm 1: BRMOB: Bayesian Regret Minimization for Offline Bandits
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper states “Our experiments use synthetic domains” and describes how the data is generated, but does not provide concrete access information or a citation for a publicly available or open dataset.
Dataset Splits No The paper mentions varying the “number of data points n” but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper states “We use MOSEK to compute the SOCP optimization” but does not specify a version number for MOSEK or any other key software dependencies.
Experiment Setup Yes Our experiments use synthetic domains, each defined by a normal prior (µ0, I) and a feature matrix Φ. ... We use the error tolerance of δ = 0.1 throughout. ... We execute Scenario with 4000 samples from the posterior.