Marginal Posterior Sampling for Slate Bandits

Authors: Maria Dimakopoulou, Nikos Vlassis, Tony Jebara

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation results establish substantial advantages of marginal posterior sampling over alternative Thompson sampling-based approaches that are widely used in the domain of web services. In a range of experiments we demonstrate that marginal posterior sampling has significantly better performance in terms of cumulative reward compared to generalized linear bandits, improving up to 30%. At the same time, marginal posterior sampling can make a slate decision up to 70 times faster than generalized linear bandits.
Researcher Affiliation Industry Maria Dimakopoulou , Nikos Vlassis , Tony Jebara Netflix {mdimakopoulou, nvlassis, tjebara}@netflix.com
Pseudocode Yes Algorithm 1 Slate Bandit as K-Armed Bernoulli Bandit, Algorithm 2 Slate Bandit as Generalized Linear Bandit, and Algorithm 3 Marginal Posterior Sampling are explicitly provided.
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets No The paper uses simulated data where 'the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]' and 'cjj Uniform([10, 20])'. It does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper states, 'The results are averaged over 1000 simulations' over a horizon of T=50000 time periods, but it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper mentions 'Duration of a single slate decision' in milliseconds but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries or solvers) needed to replicate the experiments.
Experiment Setup Yes For the additive link function case, Figure 1 shows the cumulative regret of each algorithm for slates with ℓ= 2 slots and m = 2, 3, 4, 5 base actions per slot. The results are averaged over 1000 simulations and 95% confidence intervals are shown. ... the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]... initial α(s) and β(s) for all s S (default value: 1) ... T = 50000 time periods.