Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
Authors: Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our algorithms in simulations |
| Researcher Affiliation | Collaboration | 1Cornell University and Cornell Tech 2Tsinghua University 3Arena Technologies and New York University. |
| Pseudocode | Yes | Algorithm 1 Localized Doubly Robust DROPE; Algorithm 2 Continuum Doubly Robust DROPL |
| Open Source Code | Yes | Code is available at https://github.com/Causal ML/ doubly-robust-dropel. |
| Open Datasets | No | The paper describes a simulated data generating process and does not use or provide access information for a publicly available or open dataset. |
| Dataset Splits | Yes | Randomly split D into K (approximately) even folds, with the indices of the kth fold denoted as Ik. All models were fitted with K = 5 fold cross-fitting |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions using the "Light GBM package (Ke et al., 2017)" and "Adam with a learning rate of 0.01" but does not provide specific version numbers for these software components or other libraries. |
| Experiment Setup | Yes | The state space is two-dimensional S = [ 1, 1]2, and states are sampled uniformly S Unif([ 1, 1]2). The action space is A = {0, 1, . . . , 4}, and the behavior policy is a softmax policy π0(a | s) exp(2s βa), where βa s are the coordinates of the k-th fifth root of unity, i.e. βa = (Re ζa, Im ζa) where ζa = exp(2aπi/5). Potential outcomes are normally distributed: R(a) | S = s N(s βa, σ2 a), where σ = [0.1, 0.2, 0.3, 0.4, 0.5]. We conducted experiments under three uncertainty set radii δ = 0.1, 0.2, 0.3, and in two settings, where propensities π0 were known and unknown. All models were fitted with K = 5 fold cross-fitting. In CDR2OPL, the continuum of regression functions { bf0(s, a); α} was estimated according to Section 4.1, with weights bωi(s, a) derived from fitting a Random Forest with 25 trees. Our policies were neural network softmax policies with a hidden layer of 32 neurons and Re LU activation. For Line 10, we minimized c W DR( , α) using Adam with a learning rate of 0.01. Following Dud ık et al. (2011), we repeated each policy update ten times with perturbed starting weights and picked the best weights based on training objective |