Conformal Off-Policy Prediction in Contextual Bandits
Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments; We start with synthetic experiments and an ablation study, in order to dissect and understand our proposed methodology in more detail; Table 1a shows the coverages of different methods as the policy shift = b increases; Table 1b shows the mean interval lengths |
| Researcher Affiliation | Collaboration | Muhammad Faaiz Taufiq* Department of Statistics University of Oxford Jean-Francois Ton* AI-Lab-Research Bytedance AI Lab Rob Cornish Department of Statistics University of Oxford Yee Whye Teh Department of Statistics University of Oxford Arnaud Doucet Department of Statistics University of Oxford |
| Pseudocode | Yes | Algorithm 1: Conformal Off-Policy Prediction (COPP) |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See section D. |
| Open Datasets | Yes | We now apply COPP onto a real dataset i.e. the Microsoft Ranking dataset 30k [16; 25; 4]. |
| Dataset Splits | Yes | we generate observational data Dobs = {xi, ai, yi}nobs i=1 which is then split into training (Dtr) and calibration (Dcal) datasets, of sizes m and n respectively; |
| Hardware Specification | Yes | All experiments were performed on a machine with Intel Core i7-10700K CPU, 64 GB of RAM and NVIDIA RTX 3090 GPU. |
| Software Dependencies | Yes | All models were implemented in Python 3.8 and PyTorch 1.10. |
| Experiment Setup | Yes | For training the NNs, we use Adam optimizer with learning rate 1e-3, batch size of 256, and train for 500 epochs. We also use early stopping with patience of 10 epochs. |