PAC-Bayesian Offline Contextual Bandits With Guarantees
Authors: Otmane Sakhi, Pierre Alquier, Nicolas Chopin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios. 6. Experiments |
| Researcher Affiliation | Collaboration | 1Criteo AI Lab, Paris, France 2CREST, ENSAE, IPP, Palaiseau, France 3ESSEC Business School, Asia-Pacific campus, Singapore. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | We use two mutliclass datasets: Fashion MNIST (Xiao et al., 2017) and EMNIST-b (Cohen et al., 2017), alongside two multilabel datasets: NUS-WIDE-128 (Chua et al., 2009) with 128-VLAD features (Spyromitros Xioufis et al., 2014) and Mediamill (Snoek et al., 2006) to empirically validate our findings. The statistics of the datasets are described in Table 1 in Appendix B.1; |
| Dataset Splits | Yes | We split the training split Dtrain (of size N) of the four datasets considered into Dl (nl = 0.05N) and Dc (nc = 0.95N) and use their test split Dtest. The detailed statistics of the different splits can be found in Table 1. nl = 0.05N nc = 0.95N ntest |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The logging policy π0. π0 is trained on Dl (supervised manner) with the following parameters: We use L2 regularization of 10-6. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-1 for 10 epochs. Optimising the bounds. All the bounds are optimized with the following parameters: The clipping parameter τ is fixed to 1/K with K the action size of the dataset. We use Adam (Kingma & Ba, 2014) with a learning rate of 10-3 for 100 epochs. For the bounds optimized over LIG policies, the gradient is a one dimensional integral, and is approximated using S = 32 samples. |