Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PUATE: Efficient ATE Estimation from Treated (Positive) and Unlabeled Units
Authors: Masahiro Kato, Fumiaki Kozai, RYO INOKUCHI
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section investigates the empirical performance of the proposed estimators. We also show the experimental results using semi-synthetic data in Appendix M. We generate synthetic data under the censoring setting, where the covariates X are drawn from a multivariate normal distribution as X ζ0(x), where ζ0(x) is the density of N(0, Ip), and Ip denotes the (p p) identity matrix. We set p = 3. Set P(D | X) = trunc(sigmoid(X β), 0.1, 0.9), where β is a coefficient sampled from N(0, 0.5Ip), and trunc(t, a, b) truncates t by a and b (a < b). Treatment D is sampled from the probability. The observation indicator O is generated from a Bernoulli distribution with probability c if Di = 1 and Oi = 0 if Di = 0. Here, c is generated from a uniform distribution with support [0, 1]. The outcome is generated as Y = X β +1.1+τ0 D +ε, where ε N(0, 1), where we set τ0 = 3. We set n = 3000. We conduct 5000 trials and report the empirical mean squared errors (MSEs) and biases for the true ATE and the coverage ratio (Cov. ratio) computed from the confidence intervals in Table 1. We also present the empirical distributions of the ATE estimates in Figure 2. |
| Researcher Affiliation | Industry | Masahiro Kato Fumiaki Kozai Ryo Inokuchi Mizuho-DL Financial Technology Co., Ltd. Chiyoda ku, Tokyo 102 0083 EMAIL |
| Pseudocode | Yes | Algorithm 1 Cross-fitting in the censoring setting... Algorithm 2 Cross-fitting in the case-control setting |
| Open Source Code | No | Justification: We will organize and provide the experimental code until the camera-ready. |
| Open Datasets | Yes | In this section, we investigate the empirical performance of our estimators using the Infant Health and Development Program (IHDP) dataset. The dataset contains simulated outcomes paired with covariates observed in the real world (Hill, 2011). |
| Dataset Splits | Yes | Algorithm 1 Cross-fitting in the censoring setting Input: Observations D := Xi, Oi, Yi n i=1, number of folds L, and estimation methods for µT,0, ν0, π0. Let I = {1, 2, . . . , n} be the index set. Randomly split I into L roughly equal-sized folds, (I(ℓ))ℓ L. Note that S ℓ L I(ℓ) = I. for ℓ L do Set the training data as I( ℓ) = {1, 2, . . . , n} \ I(ℓ). Construct estimators of nuisance parameters on I( ℓ), denoted by bµ(ℓ) T,n, bν(ℓ) n , bπ(ℓ) n . end for Output: Obtain an ATE estimate bτ cens-eff n using bµ(ℓ) T,n, bν(ℓ) n , and bπ(ℓ) n . We set n = 3000. We conduct 5000 trials and report the empirical mean squared errors (MSEs) and biases for the true ATE and the coverage ratio (Cov. ratio) computed from the confidence intervals in Table 1. We set m = 1000 and l = 2000 and compute the same evaluation metrics as in the censoring setting. |
| Hardware Specification | Yes | All experiments were conducted on a Mac computer equipped with an Apple M2 processor and 24 GB of RAM. |
| Software Dependencies | No | The paper mentions using "linear regression and (linear) logistic regression" and "three-layer perceptrons with hidden layers of 100 nodes" for estimation, as well as specific PU learning methods from other papers (Elkan & Noto (2008), Kiryo et al. (2017)). However, it does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow, scikit-learn, etc.) that would be necessary to replicate the computational environment. |
| Experiment Setup | No | The nuisance parameters are estimated using linear regression and (linear) logistic regression. We compared our proposed estimator, bτ cens-eff n , with the other candidates, the IPW estimator bτ cens-IPW n and the DM estimator bτ cens-DM n , defined in Remarks 4.4 and 4.4, respectively. We set n = 3000. We conduct 5000 trials and report the empirical mean squared errors (MSEs) and biases for the true ATE and the coverage ratio (Cov. ratio) computed from the confidence intervals in Table 1. In the case-control setting, covariates for the treatment and unknown groups are generated from different p-dimensional normal distributions: XT ζT,0(x) and X ζ0(x) = e0(1)ζT,0(x) + e0(0)ζC(x), where we set p = 3, ζT,0(x) and ζC(x) are the densities of normal distributions N(µp1p, Ip) and N(µn1p, Ip), µp = 0.5 and µn = 0, 1p = (1 1 1) , and e0(1) is the class prior set as e0(1) = 0.3. By definition, the propensity score e0(d | x) is given as e0(1)ζT,0(x)/ζ0(x). The outcome is generated similarly to the censoring setting Y = X β + 1.1 + τ0D + ε, where τ0 = 3. |