reproducibilityindex.ai

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

Authors: Muhammad Faaiz Taufiq, Arnaud Doucet, Rob Cornish, Jean-Francois Ton

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on synthetic and real-world datasets corroborate our theoretical findings and highlight the practical advantages of the MR estimator in OPE for contextual bandits.
Researcher Affiliation	Collaboration	Muhammad Faaiz Taufiq Department of Statistics University of Oxford Arnaud Doucet Department of Statistics University of Oxford Rob Cornish Department of Statistics University of Oxford Jean-François Ton Byte Dance Research Byte Dance
Pseudocode	No	The paper describes methods using mathematical formulations and textual descriptions but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce our experiments has been made available at: github.com/faaiz T/MR-OPE.
Open Datasets	Yes	We consider five UCI classification datasets [37] as well as Mnist [38] and CIFAR-100 [39] datasets.
Dataset Splits	No	The paper consistently refers to 'training datasets' and 'evaluation datasets' (which serve as test sets) with specific sizes (m and n). However, it does not explicitly define a separate 'validation' dataset split for purposes like hyperparameter tuning, nor does it provide common three-way split percentages.
Hardware Specification	Yes	We ran our experiments on Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz with 8GB RAM per core.
Software Dependencies	No	The paper mentions software components like 'random forest' and 'multi-layer perceptrons (MLP)' but does not provide specific version numbers for these or any other libraries or frameworks used.
Experiment Setup	Yes	For our synthetic data experiment, we reproduce the experimental setup for the synthetic data experiment in [14] by reusing their code with minor modifications. Specifically, X Rd, for various values of d as described below. Likewise, the action space A = {0, . . . , na 1}, with na taking a range of different values. Additional details regarding the reward function, behaviour policy πb, and the estimation of weights ˆw(y) have been included in Appendix F.2 for completeness. ... for MR, we split the training data to estimate bπb and ˆw(y), whereas for all other baselines we use the entire training data to estimate bπb for a fair comparison. ... we used a fully connected neural network with three hidden layers with 512, 256 and 32 nodes respectively (and Re LU activation function) to estimate the weights ˆw(y).