Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bayesian Regret Minimization in Offline Bandits
Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical results on synthetic domains confirm that our approach is superior to LCB. |
| Researcher Affiliation | Collaboration | 1University of New Hampshire 2Google Research 3Amazon AGI. |
| Pseudocode | Yes | Algorithm 1: BRMOB: Bayesian Regret Minimization for Offline Bandits |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper states “Our experiments use synthetic domains” and describes how the data is generated, but does not provide concrete access information or a citation for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions varying the “number of data points n” but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states “We use MOSEK to compute the SOCP optimization” but does not specify a version number for MOSEK or any other key software dependencies. |
| Experiment Setup | Yes | Our experiments use synthetic domains, each defined by a normal prior (µ0, I) and a feature matrix Φ. ... We use the error tolerance of δ = 0.1 throughout. ... We execute Scenario with 4000 samples from the posterior. |