PAC-Bayesian Offline Contextual Bandits With Guarantees

Authors: Otmane Sakhi, Pierre Alquier, Nicolas Chopin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios. 6. Experiments
Researcher Affiliation Collaboration 1Criteo AI Lab, Paris, France 2CREST, ENSAE, IPP, Palaiseau, France 3ESSEC Business School, Asia-Pacific campus, Singapore.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We use two mutliclass datasets: Fashion MNIST (Xiao et al., 2017) and EMNIST-b (Cohen et al., 2017), alongside two multilabel datasets: NUS-WIDE-128 (Chua et al., 2009) with 128-VLAD features (Spyromitros Xioufis et al., 2014) and Mediamill (Snoek et al., 2006) to empirically validate our findings. The statistics of the datasets are described in Table 1 in Appendix B.1;
Dataset Splits Yes We split the training split Dtrain (of size N) of the four datasets considered into Dl (nl = 0.05N) and Dc (nc = 0.95N) and use their test split Dtest. The detailed statistics of the different splits can be found in Table 1. nl = 0.05N nc = 0.95N ntest
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or memory specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes The logging policy π0. π0 is trained on Dl (supervised manner) with the following parameters: We use L2 regularization of 10-6. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-1 for 10 epochs. Optimising the bounds. All the bounds are optimized with the following parameters: The clipping parameter τ is fixed to 1/K with K the action size of the dataset. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-3 for 100 epochs. For the bounds optimized over LIG policies, the gradient is a one dimensional integral, and is approximated using S = 32 samples.