Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Boxplots
Authors: Kelly Ramsay, Jairo Diaz-Rodriguez
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulations, we show that this boxplot performs similarly to a non-private boxplot, and it outperforms the naive boxplot. Additionally, we conduct a real data analysis of Airbnb listings, which shows that comparable analysis can be achieved through differentially private boxplot visualization. |
| Researcher Affiliation | Academia | 1Department of Mathematics and Statistics, York University, Toronto, Canada. Correspondence to: Kelly Ramsay <EMAIL>. |
| Pseudocode | Yes | The algorithm, which we call DPBoxplot, is summarized in Algorithm 3, see also, Figure 1. |
| Open Source Code | Yes | Code. Official implementation is available at https: //github.com/jairoadiazr/DPBoxplot. |
| Open Datasets | Yes | We analyze a dataset containing Airbnb listing prices and associated metrics within New York City (NYC) in 2019 (Kaggle, 2019). |
| Dataset Splits | No | The paper describes using the Airbnb dataset and filtering some data points: 'After removing listings priced above 500 US dollars (USD) and requiring minimum nights of stay fewer than 10, this dataset has n = 40738 observations and d = 4 explanatory variables of business interest.' However, it does not specify any explicit training, validation, or test dataset splits for reproducibility. |
| Hardware Specification | Yes | All simulations were conducted on a single CPU and did not require significant computational resources. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for libraries, frameworks, or programming languages used in the implementation or experiments. |
| Experiment Setup | Yes | For all algorithms, we set a = 50 and b = 50 and λn = n -1/4. We ran the same simulations with other values of λn and found it did not alter the conclusions of the study, see Appendix C.2. For the unbounded algorithm, β was set to the default value of 1.001 for both the naive boxplot and DPBoxplot. |