Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
Authors: Yuta Saito, Qingyang Ren, Thorsten Joachims
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that Off CEM provides substantial improvements in OPE especially in the presence of many actions. and 4. Empirical Evaluation We first evaluate Off CEM on synthetic data to identify the situations where it enables a more accurate OPE. Second, we validate real-world applicability of Off CEM on extreme classification datasets, which can be converted into bandit problems with large action spaces. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, NY, USA. Correspondence to: Yuta Saito <ys552@cornell.edu>, Thorsten Joachims <tj@cs.cornell.edu>. |
| Pseudocode | No | The paper describes the two-step regression procedure and the Off CEM estimator using mathematical formulations and descriptive text, but it does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The experiment code is publicly available on Git Hub: https://github.com/usaito/icml2023-offcem. |
| Open Datasets | Yes | Specifically, we use EUR-Lex 4K and Wiki10-31K from the Extreme Classification Repository (Bhatia et al., 2016) and URL http://manikvarma.org/ downloads/XC/XMLRepository.html. |
| Dataset Splits | No | The paper describes generating synthetic logged data and using full datasets for real-world experiments. It mentions '3-fold cross-fitting' for training reward models, but it does not specify explicit train/test/validation dataset splits (e.g., percentages or sample counts) for the overall OPE experiments in a way that allows reproduction of data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'Open Bandit Pipeline (OBP)', 'neural network', and 'scikit-learn', but it does not provide specific version numbers for these components. |
| Experiment Setup | Yes | We use a neural network with 3 hidden layers along with 3-fold cross-fitting (Newey & Robins, 2018) to obtain ˆq(x, a) for DR and DM, and (ˆhθ, ˆgψ) for Off CEM. and we use β = 0.1. and we set ϵ = 0.2 in the main text. and standard deviation σ = 3. (for synthetic data) and we use β = 30 and we set ϵ = 0.05 in the real-world experiment. and We use uniform random action clusters for Off CEM with |C| = 100 as a heuristic. |