Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion Transformers for Imputation: Statistical Efficiency and Uncertainty Quantification

Authors: Zeqi Ye, Minshuo Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our work addresses this gap by investigating the statistical efficiency of conditional diffusion transformers for imputation and quantifying the uncertainty in missing values. Specifically, we derive statistical sample complexity bounds based on a novel approximation theory for conditional score functions using transformers, and, through this, construct tight confidence regions for missing values. Our findings also reveal that the efficiency and accuracy of imputation are significantly influenced by the missing patterns. Furthermore, we validate these theoretical insights through simulation and propose a mixed-masking training strategy to enhance the imputation performance.
Researcher Affiliation	Academia	Zeqi Ye Northwestern University Evanston, IL, USA EMAIL Minshuo Chen Northwestern University Evanston, IL, USA EMAIL
Pseudocode	Yes	Algorithm 1 Diffusion-Based Sequence Imputation 1: Module I: Training 2: Input: Fully observed sequences D := {Xi}n i=1, a masking strategy. 3: Simulate {x(i) obs, x(i) miss}n i=1 pairs via the masking strategy, and train a conditional diffusion model. 4: Output: A well-trained conditional diffusion model. 5: Module II: Imputation 6: Input: Conditional diffusion model from Module I, a new partial sequence x obs, repetition time Z, and confidence level 1 ̑̆. 7: Conditioned on x obs, independently generate B missing sequences bx(z) miss for z = 1, . . . , Z. 8: Point estimate: Mean bx miss = 1 Z PZ z=1 bx(z) miss (or median of the generated sequences). 9: Confidence region: c CR 1 ̑̆ = xmiss : xmiss bx miss 2 b D 1 ̑̆ , where b D 1 ̑̆ is the 1 ̑̆ upper quantile of bx(z) miss bx miss 2 for z = 1, . . . , Z. 10: Return: bx miss and c CR 1 ̑̆.
Open Source Code	Yes	Our code is available at https://github.com/liamyzq/Di T_time_series_imputation.
Open Datasets	Yes	We generate Gaussian process data with sequence length H = 96, dimension d = 8, and define the missing segment length as \|Imiss\| = 16. In addition to applying Algorithm 1 to construct 95% confidence regions (CRs), we sample from the true conditional distribution to evaluate CR coverage the proportion of true values that fall within the estimated CR for comparison. We utilize two real-world datasets, Beijing Air [Zhang et al., 2017] and ETT_m1, to benchmark the imputation performance of Di T. The Beijing Air dataset comprises hourly measurements of six air pollutants and meteorological variables collected from 12 monitoring sites in Beijing. The ETT_m1 dataset, part of the Electricity Transformer Temperature benchmark, records clients electricity consumption data, including power load and oil temperature. Detailed statistics for both datasets are provided in Table 5.
Dataset Splits	Yes	Table 5: 80% of the data is used for training, and 20% for testing.
Hardware Specification	Yes	Experiments were conducted on hardware consisting of an NVIDIA RTX A6000 GPU (48GB) and an Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz.
Software Dependencies	No	Our adapted Di T model architecture used a hidden size of 256, 12 transformer layers, and 16 attention heads per layer. We utilized the Py POTS [Du, 2023] framework to implement and handle hyperparameter tuning for the baseline methods CSDI and GP-VAE.
Experiment Setup	Yes	For our numerical experiments, we trained the models using a batch size of 64. Our adapted Di T model architecture used a hidden size of 256, 12 transformer layers, and 16 attention heads per layer. We utilized the Py POTS [Du, 2023] framework to implement and handle hyperparameter tuning for the baseline methods CSDI and GP-VAE. This tuning process aimed to find the best settings and ensure the models had a comparable number of trainable parameters. Experiments were conducted on hardware consisting of an NVIDIA RTX A6000 GPU (48GB) and an Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz. We report all the results as the average of 5 runs.