Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cross-Validated Off-Policy Evaluation
Authors: Matej Cief, Branislav Kveton, Michal Kompan
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method empirically and show that it addresses a variety of use cases. We empirically evaluate the procedure on estimator selection and hyper-parameter tuning problems using nine real-world datasets. |
| Researcher Affiliation | Collaboration | Matej Cief1,2, Branislav Kveton3, Michal Kompan2 1Brno University of Technology 2Kempelen Institute of Intelligent Technologies 3Adobe Research* |
| Pseudocode | Yes | Algorithm 1: Off-policy evaluation with cross-validated estimator selection. |
| Open Source Code | Yes | Code https://github.com/navarog/cross-validated-ope |
| Open Datasets | Yes | Datasets. We take nine UCI datasets (Markelle, Longjohn, and Nottingham 2023) and convert them into contextual bandit problems |
| Dataset Splits | Yes | In K-fold CV, the dataset is split into K folds. We denote the validation data in the k-th fold by Dk and all other training data by ËDk. ... We split each H into two halves, the bandit feedback dataset Hb and policy learning dataset HÏ. ... OCV is implemented as described in Algorithm 1 with K = 10. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only mentions 'The work was done at AWS AI Labs.', which is too general. |
| Software Dependencies | No | The paper mentions 'ridge regression' and 'softmax function' as techniques, but does not specify any software libraries or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn x.x). |
| Experiment Setup | Yes | The reward model Ëf in all relevant estimators is learned using ridge regression with a regularization coefficient 0.001. ... We use β0 = 1 for the logging policy and β1 = 10 for the target policy. ... OCV is implemented as described in Algorithm 1 with K = 10. ... All methods are evaluated in 90 different conditions: 9 UCI ML Repository datasets (Markelle, Longjohn, and Nottingham 2023), two target policies β1 {â10, 10}, and five logging policies β0 {â3, â1, 0, 1, 3}. |