Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Statistical Inference under Performativity

Authors: Xiang Li, Yunai Li, Huiying Zhong, Lihua Lei, Zhun Deng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of our framework through numerical experiments. To quantify the results of PPI under performativity, we evaluate the confidence-region coverage and width for three strategies: λ = 0 (only labeled data), λ = 1 (full unlabeled data weight), and our optimization method λ = bλt as defined in Eq.3. We vary the labeled sample size n and perform t {2, 3, 4} repeated risk minimization steps, averaging results over 1000 independent trials. In Figure 1, we can find that all three methods approach 0.9 coverage as n grows, while our optimized bλt (orange) achieves the narrowest interval width, supporting its effectiveness to enhance the performative inference. The dashed curves denote the bias-adjusted confidence regions for the performative stable point θPS. It can be observed that θPS coverages upper-bound that of θt (solid curves) across steps t, and the gap between them vanishes as t grows. This observation verifies the validity of Corollary 3.7.
Researcher Affiliation Academia Xiang Li* Independent Researcher EMAIL Yunai Li* Northwestern University EMAIL Huiying Zhong* MIT EMAIL Lihua Lei Stanford University EMAIL Zhun Deng UNC at Chapel Hill EMAIL
Pseudocode No The paper describes the 'repeated risk minimization' procedure as follows: 'Specifically, one starts with an arbitrary θ0 and repeat the following procedure: θt+1 = arg min θ Ez D(θt)ℓ(z; θ) for t N.' However, this description is embedded in the main text and is not presented as a formal pseudocode or algorithm block with a distinct label or structured formatting.
Open Source Code Yes We provide a zip file containing our simulation study s code.
Open Datasets Yes Following Perdomo et al. [25], we further conduct a case study in a semi-synthetic way on a realistic credit scoring task using a Kaggle dataset 4. ... 4https://www.kaggle.com/c/Give Me Some Credit/data
Dataset Splits No We set N = 2000 and vary the labeled sample size n. ... We add Gaussian noise to the original data feature to generate an unlabeled set of the same size. Then, we sample varying n labeled points with N = 18000 unlabeled points and perform t = 5 repeated risk minimization steps to compute the estimated bθt and build the confidence region for it over 100 independent trials.
Hardware Specification Yes We run our experiments on NVIDIA GPUs A100 in a single-GPU setup.
Software Dependencies No We collect bθ1:t trajectory and corresponding data {z1:t,i}n i=1 to train both models via the SGD optimizer with a learning rate of 0.1 to minimize the empirical score-matching objective J(ψ).
Experiment Setup Yes In the following experiments, we set d = 2, ε 0.02, γ = 2, and σ2 y = 0.2. We set N = 2000 and vary the labeled sample size n. ... we train both models via the SGD optimizer with a learning rate of 0.1 to minimize the empirical score-matching objective J(ψ).