Marginal Posterior Sampling for Slate Bandits
Authors: Maria Dimakopoulou, Nikos Vlassis, Tony Jebara
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation results establish substantial advantages of marginal posterior sampling over alternative Thompson sampling-based approaches that are widely used in the domain of web services. In a range of experiments we demonstrate that marginal posterior sampling has significantly better performance in terms of cumulative reward compared to generalized linear bandits, improving up to 30%. At the same time, marginal posterior sampling can make a slate decision up to 70 times faster than generalized linear bandits. |
| Researcher Affiliation | Industry | Maria Dimakopoulou , Nikos Vlassis , Tony Jebara Netflix {mdimakopoulou, nvlassis, tjebara}@netflix.com |
| Pseudocode | Yes | Algorithm 1 Slate Bandit as K-Armed Bernoulli Bandit, Algorithm 2 Slate Bandit as Generalized Linear Bandit, and Algorithm 3 Marginal Posterior Sampling are explicitly provided. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | No | The paper uses simulated data where 'the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]' and 'cjj Uniform([10, 20])'. It does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper states, 'The results are averaged over 1000 simulations' over a horizon of T=50000 time periods, but it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper mentions 'Duration of a single slate decision' in milliseconds but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries or solvers) needed to replicate the experiments. |
| Experiment Setup | Yes | For the additive link function case, Figure 1 shows the cumulative regret of each algorithm for slates with ℓ= 2 slots and m = 2, 3, 4, 5 base actions per slot. The results are averaged over 1000 simulations and 95% confidence intervals are shown. ... the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]... initial α(s) and β(s) for all s S (default value: 1) ... T = 50000 time periods. |