Conformal Off-Policy Prediction in Contextual Bandits

Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments; We start with synthetic experiments and an ablation study, in order to dissect and understand our proposed methodology in more detail; Table 1a shows the coverages of different methods as the policy shift = b increases; Table 1b shows the mean interval lengths
Researcher Affiliation Collaboration Muhammad Faaiz Taufiq* Department of Statistics University of Oxford Jean-Francois Ton* AI-Lab-Research Bytedance AI Lab Rob Cornish Department of Statistics University of Oxford Yee Whye Teh Department of Statistics University of Oxford Arnaud Doucet Department of Statistics University of Oxford
Pseudocode Yes Algorithm 1: Conformal Off-Policy Prediction (COPP)
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See section D.
Open Datasets Yes We now apply COPP onto a real dataset i.e. the Microsoft Ranking dataset 30k [16; 25; 4].
Dataset Splits Yes we generate observational data Dobs = {xi, ai, yi}nobs i=1 which is then split into training (Dtr) and calibration (Dcal) datasets, of sizes m and n respectively;
Hardware Specification Yes All experiments were performed on a machine with Intel Core i7-10700K CPU, 64 GB of RAM and NVIDIA RTX 3090 GPU.
Software Dependencies Yes All models were implemented in Python 3.8 and PyTorch 1.10.
Experiment Setup Yes For training the NNs, we use Adam optimizer with learning rate 1e-3, batch size of 256, and train for 500 epochs. We also use early stopping with patience of 10 epochs.