reproducibilityindex.ai

Conformal Off-Policy Prediction in Contextual Bandits

Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments; We start with synthetic experiments and an ablation study, in order to dissect and understand our proposed methodology in more detail; Table 1a shows the coverages of different methods as the policy shift = b increases; Table 1b shows the mean interval lengths
Researcher Affiliation	Collaboration	Muhammad Faaiz Tauﬁq* Department of Statistics University of Oxford Jean-Francois Ton* AI-Lab-Research Bytedance AI Lab Rob Cornish Department of Statistics University of Oxford Yee Whye Teh Department of Statistics University of Oxford Arnaud Doucet Department of Statistics University of Oxford
Pseudocode	Yes	Algorithm 1: Conformal Off-Policy Prediction (COPP)
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See section D.
Open Datasets	Yes	We now apply COPP onto a real dataset i.e. the Microsoft Ranking dataset 30k [16; 25; 4].
Dataset Splits	Yes	we generate observational data Dobs = {xi, ai, yi}nobs i=1 which is then split into training (Dtr) and calibration (Dcal) datasets, of sizes m and n respectively;
Hardware Specification	Yes	All experiments were performed on a machine with Intel Core i7-10700K CPU, 64 GB of RAM and NVIDIA RTX 3090 GPU.
Software Dependencies	Yes	All models were implemented in Python 3.8 and PyTorch 1.10.
Experiment Setup	Yes	For training the NNs, we use Adam optimizer with learning rate 1e-3, batch size of 256, and train for 500 epochs. We also use early stopping with patience of 10 epochs.