reproducibilityindex.ai

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Authors: Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our algorithms in simulations
Researcher Affiliation	Collaboration	1Cornell University and Cornell Tech 2Tsinghua University 3Arena Technologies and New York University.
Pseudocode	Yes	Algorithm 1 Localized Doubly Robust DROPE; Algorithm 2 Continuum Doubly Robust DROPL
Open Source Code	Yes	Code is available at https://github.com/Causal ML/ doubly-robust-dropel.
Open Datasets	No	The paper describes a simulated data generating process and does not use or provide access information for a publicly available or open dataset.
Dataset Splits	Yes	Randomly split D into K (approximately) even folds, with the indices of the kth fold denoted as Ik. All models were ﬁtted with K = 5 fold cross-ﬁtting
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments.
Software Dependencies	No	The paper mentions using the "Light GBM package (Ke et al., 2017)" and "Adam with a learning rate of 0.01" but does not provide specific version numbers for these software components or other libraries.
Experiment Setup	Yes	The state space is two-dimensional S = [ 1, 1]2, and states are sampled uniformly S Unif([ 1, 1]2). The action space is A = {0, 1, . . . , 4}, and the behavior policy is a softmax policy π0(a \| s) exp(2s βa), where βa s are the coordinates of the k-th ﬁfth root of unity, i.e. βa = (Re ζa, Im ζa) where ζa = exp(2aπi/5). Potential outcomes are normally distributed: R(a) \| S = s N(s βa, σ2 a), where σ = [0.1, 0.2, 0.3, 0.4, 0.5]. We conducted experiments under three uncertainty set radii δ = 0.1, 0.2, 0.3, and in two settings, where propensities π0 were known and unknown. All models were ﬁtted with K = 5 fold cross-ﬁtting. In CDR2OPL, the continuum of regression functions { bf0(s, a); α} was estimated according to Section 4.1, with weights bωi(s, a) derived from ﬁtting a Random Forest with 25 trees. Our policies were neural network softmax policies with a hidden layer of 32 neurons and Re LU activation. For Line 10, we minimized c W DR( , α) using Adam with a learning rate of 0.01. Following Dud ık et al. (2011), we repeated each policy update ten times with perturbed starting weights and picked the best weights based on training objective