Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects

Authors: Zijun Gao, Yanjun Han

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the efficacy of the proposed estimator in Algorithm 2 via some numerical experiments. Specifically, we aim to show that the two main ingredients of Algorithm 2, i.e. constructing pseudoobservations based on covariate matching and discarding observations with poor matching quality, are key to improved HTE estimation. We compare our estimator (which we call selected matching) with the following three estimators: the full matching estimator which never discards samples (i.e. m2 = m1 always holds in Algorithm 2), the k NN differencing and kernel differencing estimators which apply separate k-NN or kernel estimates to both baselines and then take the difference. The performance of HTE estimation is measured via the root mean squared error (RMSE) averaged over 100 simulations. The experimental results are displayed in Figures 1 and 2
Researcher Affiliation Academia Zijun Gao Department of Statistics Stanford University Email: EMAIL Yanjun Han Department of Electrical Engineering Stanford Univeristy Email: EMAIL
Pseudocode Yes Algorithm 1 Estimator Construction under Fixed Design Algorithm 2 Estimator Construction under Random Design
Open Source Code Yes The source codes are available at https://github.com/Mathegineer/Nonparametric_HTE.
Open Datasets No The paper uses synthetically generated data for its experiments, as described by: 'For each given (n, d, κ, σ), we generate n control covariates X0 1, , X0 n following the i.i.d. density g0(x)... Similarly, the treatment covariates X1 1, , X1 n are i.i.d. generated following the density g1(x) = 2 g0(x), and the responses Y 0 i , Y 1 i are defined in (1) with i.i.d. N(0, σ2) noises.' It does not refer to or provide access information for a pre-existing public dataset.
Dataset Splits No The paper uses synthetically generated data for each simulation and does not describe explicit train/validation/test splits of a dataset.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper provides a link to source code but does not list specific software dependencies with version numbers within its text.
Experiment Setup No The paper describes the input parameter settings for the simulations (n, d, κ, σ) and states that 'The algorithm parameters are determined by the optimal bias-variance tradeoffs in theory.' However, it does not provide concrete hyperparameter values or detailed training configurations (e.g., specific m1, m2 values chosen for the experiments, or other optimization settings) for their implemented algorithms.