Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Disentangling misreporting from genuine adaptation in strategic settings: a causal approach

Authors: Dylan Zapzalka, Trenton Chang, Lindsay Warrenburg, Sae-Hwan Park, Daniel Shenfeld, Ravi B. Parikh, Jenna Wiens, Maggie Makar

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our theoretical results using a semi-synthetic and real Medicare dataset with misreported data, demonstrating that our approach can be employed to identify misreporting in real-world scenarios. 5 Empirical Results We evaluate the performance of our approach (CMRE) on semi-synthetic and real-world data. We show that CMRE consistently yields reliable estimates of the MR, even when genuine adaptation is present, and outperforms relevant baselines.
Researcher Affiliation Academia Dylan Zapzalka1 Trenton Chang1 Lindsay Warrenburg2 Sae-Hwan Park2 Daniel K. Shenfeld2 Ravi B. Parikh2,3 Jenna Wiens1 Maggie Makar1 1University of Michigan 2University of Pennsylvania 3Emory University EMAIL
Pseudocode Yes Algorithm 1 CMRE algorithm Input: D = {(xi, yi, ci, ai)}N i and D = {(x i , yi, ci)}M i Output: d MR, an estimate of the MR for each agent for each agent a do Estimate θa(c) using equation 4 Estimate θ (c) using equation 5 Estimate ˆτ a, ˆτa, and ˆδ a using equation 6 return ˆτ a ˆτa ˆδ a end for Algorithm 2 NMRE algorithm Input: D = {(xi, yi, ci, ai)}N i and D = {(x i , yi, ci)}M i Output: d MR, an estimate of the MR for each agent for each agent a do Estimate ˆτ using equation 7 Estimate ˆτa using equation 8 return ˆτ ˆτa Algorithm 3 NDEE algorithm
Open Source Code Yes Code for our experiments is publicly available at https://github.com/Dylan James Zapzalka/misreporting_estimation.
Open Datasets Yes We extract the confounders from a real credit card dataset (n = 30, 000) [45, 46] [45] I-Cheng Yeh. Default of Credit Card Clients. UCI Machine Learning Repository, 2009. DOI: https://doi.org/10.24432/C55S3H.
Dataset Splits Yes Each experiment is repeated 100 times each with a different draw of A, X, X , and Y . We use an 80/20 train/test split of D where our models are trained on the larger split and the MR is estimated using the smaller split. We do not split D as it is only used to train the models.
Hardware Specification Yes All experiments were conducted using 16 CPU cores and 32 GB of memory on a computing cluster with 2 x 2.5 GHz Intel Haswell (Xeon E5-2680v3) processors, which was managed using a Slurm resource manager.
Software Dependencies Yes All of the code for the experiments was written in Python 3.10.16 (PSF License). The XGBoost models were implemented using the XGBoost 2.1.4 (Apache License 2.0) [7]. The OC-SVM baseline was implemented by using scikit-learn 1.6.1 (BSD License) [35], which used the implementation of the One-Class SVM. To generate the semi-synthetic datasets and for data processing tasks, both numpy 2.0.2 (modified BSD license) [16] and pandas 2.2.3 (BSD license) [34] were employed. For the Medicare dataset, HCCPy 0.1.9 (Apache License 2.0) was employed to extract the HCCs from raw data. All plots were created using matplotlib 3.10.1 (PSF License) [20].
Experiment Setup Yes The complete algorithm for CMRE is summarized in 1. We note that for our experiments, we split D such that the data used to train fa(c, x) in equation 4 is different than the data used to estimate the MR in equation 6. Specifically, 80% of the data in D is used to train fa(c, x) in equation 4 and the other 20% is used to estimate ˆτ a, ˆτa, and ˆδ a. ... To estimate θa(c) and θ (c), we employ an S-learner, where the models fa and f are implemented using XGBoost. We use the default hyperparameters provided by the XGBoost library in Python to train each model [7], including a learning rate of 0.3, a maximume tree depth of 6, and L2 regularization with a coefficient of 1. ... We use the One-Class SVM implementation from the scikit-learn library [35]. Given our assumption that all data points in D 1 are correctly reported, we used a small ν parameter (0.01). Additionally, we use an RBF kernel with a bandwidth parameter γ = 0.1.