Orthogonal Machine Learning: Power and Limitations

Authors: Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these techniques in the setting of demand estimation from pricing and purchase data, where highly non Gaussian treatment residuals are standard. In this setting, the treatment is the price of a product, and commonly, conditional on all observable covariates, the treatment follows a discrete distribution representing random discounts offered to customers over a baseline price linear in the observables. In Figure 1 we portray the results of a synthetic demand estimation problem with dense dependence on observables. Here, the standard orthogonal moment estimation has large bias, comparable to variance, while our second-order orthogonal moments lead to nearly unbiased estimation.
Researcher Affiliation Collaboration 1Microsoft Research New England, USA 2Operations Research Center, MIT, USA. Correspondence to: Lester Mackey <lmackey@microsoft.com>, Vasilis Syrgkanis <vasy@microsoft.com>, Ilias Zadik <izadik@mit.edu>.
Pseudocode No The paper provides mathematical derivations and descriptions of methods but does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Python code recreating all experiments is available at https://github.com/Ilias Zadik/double_orthogonal_ml.
Open Datasets No We generated n independent replicates of outcome Y , treatment T, and confounding covariates X.
Dataset Splits Yes This sample-splitting procedure proceeds as follows. 1. First stage. Form an estimate ˆh H of h0 using (Zt)2n t=n+1 (e.g., by running a nonparametric or highdimensional regression procedure). 2. Second stage. Compute a Z-estimate ˆθSS Θ of θ0 using an empirical version of the moment conditions (1) and ˆh as a plug-in estimate of h0: ˆθSS solves 1 n Pn t=1 m(Zt, θ, ˆh(Xt)) = 0. ... A form of repeated sample splitting called K-fold crossfitting (see, e.g., Chernozhukov et al., 2017) addresses both of these concerns. K-fold cross-fitting partitions the index set of the datapoints [2n] into K subsets I1, . . . , IK of cardinality 2n K (assuming for simplicity that K divides 2n) and produces the following two-stage estimate: ... For the first-order method all remaining n/2 points were used for the second stage estimation of θ0. For the second-order method, the moments E[η2] and E[η3] were estimated using a subsample of n/4 points as described in Theorem 10, and the remaining n/4 sample points were used for the second stage estimation of θ0. For each method we performed cross-fitting across the first and second stages, and for the second-order method we performed nested crossfitting between the n/4 subsample used for the E[η2] and E[η3] estimation and the n/4 subsample used for the second stage estimation.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments.
Software Dependencies No The paper mentions "Python code recreating all experiments" but does not specify the Python version or any other software dependencies with version numbers.
Experiment Setup Yes Sample size n = 5000, dimension of confounders d = 1000, support size of sparse linear nuisance functions s = 100. ... The regularization parameter λn of each Lasso was chosen to be p.