reproducibilityindex.ai

PAC-Bayesian Offline Contextual Bandits With Guarantees

Authors: Otmane Sakhi, Pierre Alquier, Nicolas Chopin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios. 6. Experiments
Researcher Affiliation	Collaboration	1Criteo AI Lab, Paris, France 2CREST, ENSAE, IPP, Palaiseau, France 3ESSEC Business School, Asia-Pacific campus, Singapore.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We use two mutliclass datasets: Fashion MNIST (Xiao et al., 2017) and EMNIST-b (Cohen et al., 2017), alongside two multilabel datasets: NUS-WIDE-128 (Chua et al., 2009) with 128-VLAD features (Spyromitros Xioufis et al., 2014) and Mediamill (Snoek et al., 2006) to empirically validate our findings. The statistics of the datasets are described in Table 1 in Appendix B.1;
Dataset Splits	Yes	We split the training split Dtrain (of size N) of the four datasets considered into Dl (nl = 0.05N) and Dc (nc = 0.95N) and use their test split Dtest. The detailed statistics of the different splits can be found in Table 1. nl = 0.05N nc = 0.95N ntest
Hardware Specification	No	The paper does not provide specific hardware details (like GPU/CPU models or memory specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The logging policy π0. π0 is trained on Dl (supervised manner) with the following parameters: We use L2 regularization of 10-6. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-1 for 10 epochs. Optimising the bounds. All the bounds are optimized with the following parameters: The clipping parameter τ is fixed to 1/K with K the action size of the dataset. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-3 for 100 epochs. For the bounds optimized over LIG policies, the gradient is a one dimensional integral, and is approximated using S = 32 samples.