Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Authors: Yuta Saito, Qingyang Ren, Thorsten Joachims

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Off CEM provides substantial improvements in OPE especially in the presence of many actions. and 4. Empirical Evaluation We first evaluate Off CEM on synthetic data to identify the situations where it enables a more accurate OPE. Second, we validate real-world applicability of Off CEM on extreme classification datasets, which can be converted into bandit problems with large action spaces.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, NY, USA. Correspondence to: Yuta Saito <ys552@cornell.edu>, Thorsten Joachims <tj@cs.cornell.edu>.
Pseudocode No The paper describes the two-step regression procedure and the Off CEM estimator using mathematical formulations and descriptive text, but it does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes The experiment code is publicly available on Git Hub: https://github.com/usaito/icml2023-offcem.
Open Datasets Yes Specifically, we use EUR-Lex 4K and Wiki10-31K from the Extreme Classification Repository (Bhatia et al., 2016) and URL http://manikvarma.org/ downloads/XC/XMLRepository.html.
Dataset Splits No The paper describes generating synthetic logged data and using full datasets for real-world experiments. It mentions '3-fold cross-fitting' for training reward models, but it does not specify explicit train/test/validation dataset splits (e.g., percentages or sample counts) for the overall OPE experiments in a way that allows reproduction of data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like 'Open Bandit Pipeline (OBP)', 'neural network', and 'scikit-learn', but it does not provide specific version numbers for these components.
Experiment Setup Yes We use a neural network with 3 hidden layers along with 3-fold cross-fitting (Newey & Robins, 2018) to obtain ˆq(x, a) for DR and DM, and (ˆhθ, ˆgψ) for Off CEM. and we use β = 0.1. and we set ϵ = 0.2 in the main text. and standard deviation σ = 3. (for synthetic data) and we use β = 30 and we set ϵ = 0.05 in the real-world experiment. and We use uniform random action clusters for Off CEM with |C| = 100 as a heuristic.