Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Integer Subspace Differential Privacy

Authors: Prathamesh Dharangutte, Jie Gao, Ruobin Gong, Fang-Yi Yu

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our proposal with applications to a synthetic problem with intersecting invariants, a sensitive contingency table with known margins, and the 2010 Census county-level demonstration data with mandated ﬁxed state population totals.
Researcher Affiliation	Academia	Prathamesh Dharangutte1, Jie Gao1, Ruobin Gong2, Fang-Yi Yu3 1Department of Computer Science, Rutgers University 2Department of Statistics, Rutgers University 3Department of Computer Science, George Mason University
Pseudocode	Yes	Algorithm 1 in Appendix D of this paper s full version (Dharangutte et al. 2022) presents a Gibbs-within Metropolis sampler that produces a sequences of dependent draws z(l) 0 <= l <= nsim from the target distribution q epsilon in (5) known only up to a normalizing constant. We use an additive jumping distribution whose element-wise construction is described in (11). The algorithm incurs a transition kernel that dictates how the chain moves from an existing state to the next one: z(l) ~ K(z(l-1)).
Open Source Code	No	The paper refers to an extended version on ArXiv for additional details and algorithms, but does not provide a direct link to any open-source code repository for the methodology described.
Open Datasets	Yes	The Federal Committee on Statistical Methodology published a ﬁctitious dataset concerning delinquent children in the form of a 4 x 4 contingency table, tabulated across four counties by education level of household head (Table 4 in Federal Committee on Statistical Methodology 2005, reproduced in Table 2 of Appendix E). The conﬁdential values are the 2010 Census Summary Files (CSF), curated by IPUMS NHGIS and are publicly available (Van Riper, Kugler, and Schroeder 2020).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	To ensure adequate dispersion of the target distribution, we set epsilon = 0.25, a value on the smaller end within the range of meaningful privacy protection (e.g. Dwork 2011). The pre-jump proposal distributions eta_j are double geometric distributions, with parameter a = exp(-1) for the l1-norm target and a = exp(-1.5) for the l2-norm target.