reproducibilityindex.ai

Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects

Authors: Zijun Gao, Yanjun Han

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the efﬁcacy of the proposed estimator in Algorithm 2 via some numerical experiments. Speciﬁcally, we aim to show that the two main ingredients of Algorithm 2, i.e. constructing pseudoobservations based on covariate matching and discarding observations with poor matching quality, are key to improved HTE estimation. We compare our estimator (which we call selected matching) with the following three estimators: the full matching estimator which never discards samples (i.e. m2 = m1 always holds in Algorithm 2), the k NN differencing and kernel differencing estimators which apply separate k-NN or kernel estimates to both baselines and then take the difference. The performance of HTE estimation is measured via the root mean squared error (RMSE) averaged over 100 simulations. The experimental results are displayed in Figures 1 and 2
Researcher Affiliation	Academia	Zijun Gao Department of Statistics Stanford University Email: zijungao@stanford.edu Yanjun Han Department of Electrical Engineering Stanford Univeristy Email: yjhan@stanford.edu
Pseudocode	Yes	Algorithm 1 Estimator Construction under Fixed Design Algorithm 2 Estimator Construction under Random Design
Open Source Code	Yes	The source codes are available at https://github.com/Mathegineer/Nonparametric_HTE.
Open Datasets	No	The paper uses synthetically generated data for its experiments, as described by: 'For each given (n, d, κ, σ), we generate n control covariates X0 1, , X0 n following the i.i.d. density g0(x)... Similarly, the treatment covariates X1 1, , X1 n are i.i.d. generated following the density g1(x) = 2 g0(x), and the responses Y 0 i , Y 1 i are deﬁned in (1) with i.i.d. N(0, σ2) noises.' It does not refer to or provide access information for a pre-existing public dataset.
Dataset Splits	No	The paper uses synthetically generated data for each simulation and does not describe explicit train/validation/test splits of a dataset.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper provides a link to source code but does not list specific software dependencies with version numbers within its text.
Experiment Setup	No	The paper describes the input parameter settings for the simulations (n, d, κ, σ) and states that 'The algorithm parameters are determined by the optimal bias-variance tradeoffs in theory.' However, it does not provide concrete hyperparameter values or detailed training configurations (e.g., specific m1, m2 values chosen for the experiments, or other optimization settings) for their implemented algorithms.