Bayesian Regret Minimization in Offline Bandits
Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical results on synthetic domains confirm that our approach is superior to LCB. |
| Researcher Affiliation | Collaboration | 1University of New Hampshire 2Google Research 3Amazon AGI. |
| Pseudocode | Yes | Algorithm 1: BRMOB: Bayesian Regret Minimization for Offline Bandits |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper states “Our experiments use synthetic domains” and describes how the data is generated, but does not provide concrete access information or a citation for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions varying the “number of data points n” but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states “We use MOSEK to compute the SOCP optimization” but does not specify a version number for MOSEK or any other key software dependencies. |
| Experiment Setup | Yes | Our experiments use synthetic domains, each defined by a normal prior (µ0, I) and a feature matrix Φ. ... We use the error tolerance of δ = 0.1 throughout. ... We execute Scenario with 4000 samples from the posterior. |