reproducibilityindex.ai

Dynamic Early Stopping for Naive Bayes

Authors: Aäron Verachtert, Hendrik Blockeel, Jesse Davis

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experimental Evaluation The goal of our empirical evaluation is to answer the following questions: 1. For a ﬁxed attribute budget k, how does the dynamic approach compare to the static one in terms of efﬁciency and accuracy? 2. How does our proposed approach compare to same-decision probability (SDP) [Choi et al., 2012] and costsensitive Naive Bayes (cs NB) [Chai et al., 2004]? These represent two other ways to dynamically make a prediction based on only a subset of the attributes. To answer the ﬁrst question, we train two types of models for each attribute ordering. The baseline is a static model that always uses k attributes to make a prediction. Then we learn a dynamic model according to our proposed approach that can consider at most k attributes at prediction time. To explore a range of different operating conditions, we investigate various values of k. 5.1 Data Sets We perform an evaluation using seven data sets from various domains, summarized in Table 1.
Researcher Affiliation	Academia	A aron Verachtert, Hendrik Blockeel, and Jesse Davis Department of Computer Science KU Leuven Celestijnenlaan 200A 3001 Leuven, Belgium {aaron.verachtert, hendrik.blockeel, jesse.davis}@cs.kuleuven.be
Pseudocode	Yes	Algorithm 1 Prediction with Stop Points and Algorithm 2 Training Naive Bayes with Stop Points
Open Source Code	No	The paper states "Full experimental results are available in the online supplement at http://dtai.cs.kuleuven.be/software/dsnb" but does not explicitly mention the availability of source code for the methodology.
Open Datasets	Yes	We perform an evaluation using seven data sets from various domains, summarized in Table 1. ... As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004]...
Dataset Splits	Yes	Each data set is split into a training set, a validation set and a test set. As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004], where the training set consists of the ﬁrst 23,149 instances, the validation set consists of the next 23,149 instances, and the test set consists of the remaining instances. For the other data sets, we randomly select 40% of the examples for training, 20% for validation, and 40% for testing.
Hardware Specification	Yes	For efﬁciency, we begin by using a Raspberry Pi system (Model B+, 512MB, with Power Bank battery pack supplying 5V at a maximum of 1A) running Raspbian and a Java Virtual Machine, and measure energy consumption using a Tenma digital multimeter 727730A.
Software Dependencies	No	The paper mentions 'running Raspbian and a Java Virtual Machine' but does not provide specific version numbers for software components or libraries.
Experiment Setup	Yes	For our approach, we set p = 0.05 and s = 0.05 and did not try other values.