reproducibilityindex.ai

Naive Bayes Classifiers over Missing Data: Decision and Poisoning

Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our algorithms are efﬁcient and outperform existing baselines.
Researcher Affiliation	Academia	1Department of Computer Sciences, University of Wisconsin-Madison, Madison WI, USA.
Pseudocode	Yes	Algorithm 1: Iterative Algorithm for the Decision Problem
Open Source Code	Yes	Our implementation is publicly available at https:// github.com/Waterpine/NBC-Missing.
Open Datasets	Yes	We use ten real-world datasets from Kaggle (web, 2022a): heart (HE) (dat, 2023e), ﬁtness-club (FC) (dee dee, 2023), fetalhealth (FH) (dat, 2023d), employee (EM) (dat, 2023c), winequality N (WQ) (dat, 2023g), company-bankruptcy (CB) (dat, 2023b), Mushroom (MR) (dat, 2023f), body Performance (BP) (dat, 2023a), star-classiﬁcation (SC) (fedesoriano, 2023), creditcard (CC) (Elgiriyewithana, 2023).
Dataset Splits	No	The paper only specifies an 80% training and 20% testing split, without explicit mention of a validation split.
Hardware Specification	Yes	Our experiments were performed on a bare-metal server provided by Cloudlab (Cloud Lab). The server is equipped with two 10-core Intel Xeon E5-2660 CPUs running at 2.60 GHz.
Software Dependencies	No	The paper mentions using 'sklearn’s KBins Discretizer (web, 2022b)' but does not provide specific version numbers for Python or other libraries.
Experiment Setup	Yes	We ﬁrst preprocess every dataset so that it contains only categorical features by partitioning each numerical feature into 5 segments (or bins) of equal size using sklearn s KBins Discretizer (web, 2022b).