reproducibilityindex.ai

Marginal Posterior Sampling for Slate Bandits

Authors: Maria Dimakopoulou, Nikos Vlassis, Tony Jebara

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results establish substantial advantages of marginal posterior sampling over alternative Thompson sampling-based approaches that are widely used in the domain of web services. In a range of experiments we demonstrate that marginal posterior sampling has signiﬁcantly better performance in terms of cumulative reward compared to generalized linear bandits, improving up to 30%. At the same time, marginal posterior sampling can make a slate decision up to 70 times faster than generalized linear bandits.
Researcher Affiliation	Industry	Maria Dimakopoulou , Nikos Vlassis , Tony Jebara Netﬂix {mdimakopoulou, nvlassis, tjebara}@netﬂix.com
Pseudocode	Yes	Algorithm 1 Slate Bandit as K-Armed Bernoulli Bandit, Algorithm 2 Slate Bandit as Generalized Linear Bandit, and Algorithm 3 Marginal Posterior Sampling are explicitly provided.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	No	The paper uses simulated data where 'the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]' and 'cjj Uniform([10, 20])'. It does not refer to a publicly available dataset with concrete access information.
Dataset Splits	No	The paper states, 'The results are averaged over 1000 simulations' over a horizon of T=50000 time periods, but it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	No	The paper mentions 'Duration of a single slate decision' in milliseconds but does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory specifications used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries or solvers) needed to replicate the experiments.
Experiment Setup	Yes	For the additive link function case, Figure 1 shows the cumulative regret of each algorithm for slates with ℓ= 2 slots and m = 2, 3, 4, 5 base actions per slot. The results are averaged over 1000 simulations and 95% conﬁdence intervals are shown. ... the slot-action marginal expected rewards are drawn uniformly from the interval [0.05, 0.15]... initial α(s) and β(s) for all s S (default value: 1) ... T = 50000 time periods.