Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributed mediation analysis with communication efficiency
Authors: Shaomin Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical analysis and numerical experiments show that, compared to the global test obtained by pooling all data together, the proposed tests achieve nearly identical power, independent of the number of machines. Furthermore, based on these two distributed test statistics, many enhanced mediation tests derived from the Sobel s or Max P tests can be easily adapted to the distributed system. We apply our method to an educational study, testing whether the effect of high school mathematics on college-level Probability and Mathematical Statistics courses is mediated by Calculus. |
| Researcher Affiliation | Academia | Shaomin Li School of Mathematics and Statistics Beijing Jiaotong University Beijing, 100044 EMAIL |
| Pseudocode | Yes | Algorithm 1: The Distributed Test of Mediation Effects. Step 1. For k = 1, ..., K, compute the local statistics T (k) β and T (k) γ , then transmit them to the central machine. Step 2. In the central machine, compute the distributed statistics Tβ and Tγ using (4), then compute the distributed Sobel statistic T Dis Sobel and Max P statistic T Dis Max P using (5) and (6), respectively. Step 3. Given the significance level α, if |T Dis Sobel| > Z1 α/2 or T Dis Max P < α, reject H0. |
| Open Source Code | Yes | The code provided in the supplemental materials can be used to reproduce results in both simulation study and real data anaalysis. |
| Open Datasets | No | The data are sourced from three classes, with only summary information of local data available from each class due to student privacy concerns. Our distributed test successfully detects the mediation effect, which would be undetectable using local tests from just the first or second class. |
| Dataset Splits | Yes | In this setting, we set the sample size in each machine are the same, that is, n = N/K. We fix the total sample size N = 211, and set the number of machines K = 1, 2, 4, 8, 16, 32, 64. ... In this setting, we first generate the local sample sizes nk randomly from 50 to 150 for k = 1, 2, . . . , 32. Then, we generate a total of K k=1 nk data points. ... We distributed surveys to three classes at a university. ... Class 1: n1 = 34, ... Class 2: n2 = 14, ... Class 3: n3 = 17. |
| Hardware Specification | Yes | We conducted the tests in R Studio on a Mac Book Pro with an M2 CPU, and each experiment was repeated 2,000 times to calculate the empirical sizes and powers of the tests. |
| Software Dependencies | No | The paper mentions 'R Studio' but does not specify a version number or any other software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | In this section, we conduct extensive simulation studies to evaluate the performance of the proposed distributed Sobel test and Max P test. We generated the p-dimensional exposure variable A Bernouli(0.5), the covariate X N(0, ΣX) with p = 3 and ΣX = (|0.5|j l)p p. The mediator M and the outcome Y were simulated as follows Y = A + βM + βT XX + ϵY , ϵY N(0, 2), (7) M = γA + γT XX + ϵM, ϵM N(0, 1), (8) where βX = (1, 0.5, 1)T , γX = (0.5, 1, 1)T . Under the null hypothesis, we consider three scenarios: (1) (β, γ) = (0.2, 0); (2) (β, γ) = (0, 0.2); (3) (β, γ) = (0, 0). Under the alternatives, we set (β, γ) = (0.2, 0.05), (0.1, 0.1), and (0.05, 0.2). ... The significance level α = 0.05. We conducted the tests in R Studio on a Mac Book Pro with an M2 CPU, and each experiment was repeated 2,000 times to calculate the empirical sizes and powers of the tests. |