Naive Bayes Classifiers over Missing Data: Decision and Poisoning
Authors: Song Bian, Xiating Ouyang, Zhiwei Fan, Paraschos Koutris
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our algorithms are efficient and outperform existing baselines. |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, University of Wisconsin-Madison, Madison WI, USA. |
| Pseudocode | Yes | Algorithm 1: Iterative Algorithm for the Decision Problem |
| Open Source Code | Yes | Our implementation is publicly available at https:// github.com/Waterpine/NBC-Missing. |
| Open Datasets | Yes | We use ten real-world datasets from Kaggle (web, 2022a): heart (HE) (dat, 2023e), fitness-club (FC) (dee dee, 2023), fetalhealth (FH) (dat, 2023d), employee (EM) (dat, 2023c), winequality N (WQ) (dat, 2023g), company-bankruptcy (CB) (dat, 2023b), Mushroom (MR) (dat, 2023f), body Performance (BP) (dat, 2023a), star-classification (SC) (fedesoriano, 2023), creditcard (CC) (Elgiriyewithana, 2023). |
| Dataset Splits | No | The paper only specifies an 80% training and 20% testing split, without explicit mention of a validation split. |
| Hardware Specification | Yes | Our experiments were performed on a bare-metal server provided by Cloudlab (Cloud Lab). The server is equipped with two 10-core Intel Xeon E5-2660 CPUs running at 2.60 GHz. |
| Software Dependencies | No | The paper mentions using 'sklearn’s KBins Discretizer (web, 2022b)' but does not provide specific version numbers for Python or other libraries. |
| Experiment Setup | Yes | We first preprocess every dataset so that it contains only categorical features by partitioning each numerical feature into 5 segments (or bins) of equal size using sklearn s KBins Discretizer (web, 2022b). |