Naive Bayes Classifiers over Missing Data: Decision and Poisoning

Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines.
Researcher Affiliation Academia 1Department of Computer Sciences, University of Wisconsin-Madison, Madison WI, USA.
Pseudocode Yes Algorithm 1: Iterative Algorithm for the Decision Problem
Open Source Code Yes Our implementation is publicly available at https:// github.com/Waterpine/NBC-Missing.
Open Datasets Yes We use ten real-world datasets from Kaggle (web, 2022a): heart (HE) (dat, 2023e), fitness-club (FC) (dee dee, 2023), fetalhealth (FH) (dat, 2023d), employee (EM) (dat, 2023c), winequality N (WQ) (dat, 2023g), company-bankruptcy (CB) (dat, 2023b), Mushroom (MR) (dat, 2023f), body Performance (BP) (dat, 2023a), star-classification (SC) (fedesoriano, 2023), creditcard (CC) (Elgiriyewithana, 2023).
Dataset Splits No The paper only specifies an 80% training and 20% testing split, without explicit mention of a validation split.
Hardware Specification Yes Our experiments were performed on a bare-metal server provided by Cloudlab (Cloud Lab). The server is equipped with two 10-core Intel Xeon E5-2660 CPUs running at 2.60 GHz.
Software Dependencies No The paper mentions using 'sklearn’s KBins Discretizer (web, 2022b)' but does not provide specific version numbers for Python or other libraries.
Experiment Setup Yes We first preprocess every dataset so that it contains only categorical features by partitioning each numerical feature into 5 segments (or bins) of equal size using sklearn s KBins Discretizer (web, 2022b).