Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Naive Bayes Classifiers over Missing Data: Decision and Poisoning
Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines. |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, University of Wisconsin-Madison, Madison WI, USA. |
| Pseudocode | Yes | Algorithm 1: Iterative Algorithm for the Decision Problem |
| Open Source Code | Yes | Our implementation is publicly available at https:// github.com/Waterpine/NBC-Missing. |
| Open Datasets | Yes | We use ten real-world datasets from Kaggle (web, 2022a): heart (HE) (dat, 2023e), fitness-club (FC) (dee dee, 2023), fetalhealth (FH) (dat, 2023d), employee (EM) (dat, 2023c), winequality N (WQ) (dat, 2023g), company-bankruptcy (CB) (dat, 2023b), Mushroom (MR) (dat, 2023f), body Performance (BP) (dat, 2023a), star-classification (SC) (fedesoriano, 2023), creditcard (CC) (Elgiriyewithana, 2023). |
| Dataset Splits | No | The paper only specifies an 80% training and 20% testing split, without explicit mention of a validation split. |
| Hardware Specification | Yes | Our experiments were performed on a bare-metal server provided by Cloudlab (Cloud Lab). The server is equipped with two 10-core Intel Xeon E5-2660 CPUs running at 2.60 GHz. |
| Software Dependencies | No | The paper mentions using 'sklearn’s KBins Discretizer (web, 2022b)' but does not provide specific version numbers for Python or other libraries. |
| Experiment Setup | Yes | We first preprocess every dataset so that it contains only categorical features by partitioning each numerical feature into 5 segments (or bins) of equal size using sklearn s KBins Discretizer (web, 2022b). |