Dynamic Early Stopping for Naive Bayes

Authors: Aäron Verachtert, Hendrik Blockeel, Jesse Davis

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental Evaluation The goal of our empirical evaluation is to answer the following questions: 1. For a fixed attribute budget k, how does the dynamic approach compare to the static one in terms of efficiency and accuracy? 2. How does our proposed approach compare to same-decision probability (SDP) [Choi et al., 2012] and costsensitive Naive Bayes (cs NB) [Chai et al., 2004]? These represent two other ways to dynamically make a prediction based on only a subset of the attributes. To answer the first question, we train two types of models for each attribute ordering. The baseline is a static model that always uses k attributes to make a prediction. Then we learn a dynamic model according to our proposed approach that can consider at most k attributes at prediction time. To explore a range of different operating conditions, we investigate various values of k. 5.1 Data Sets We perform an evaluation using seven data sets from various domains, summarized in Table 1.
Researcher Affiliation Academia A aron Verachtert, Hendrik Blockeel, and Jesse Davis Department of Computer Science KU Leuven Celestijnenlaan 200A 3001 Leuven, Belgium {aaron.verachtert, hendrik.blockeel, jesse.davis}@cs.kuleuven.be
Pseudocode Yes Algorithm 1 Prediction with Stop Points and Algorithm 2 Training Naive Bayes with Stop Points
Open Source Code No The paper states "Full experimental results are available in the online supplement at http://dtai.cs.kuleuven.be/software/dsnb" but does not explicitly mention the availability of source code for the methodology.
Open Datasets Yes We perform an evaluation using seven data sets from various domains, summarized in Table 1. ... As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004]...
Dataset Splits Yes Each data set is split into a training set, a validation set and a test set. As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004], where the training set consists of the first 23,149 instances, the validation set consists of the next 23,149 instances, and the test set consists of the remaining instances. For the other data sets, we randomly select 40% of the examples for training, 20% for validation, and 40% for testing.
Hardware Specification Yes For efficiency, we begin by using a Raspberry Pi system (Model B+, 512MB, with Power Bank battery pack supplying 5V at a maximum of 1A) running Raspbian and a Java Virtual Machine, and measure energy consumption using a Tenma digital multimeter 727730A.
Software Dependencies No The paper mentions 'running Raspbian and a Java Virtual Machine' but does not provide specific version numbers for software components or libraries.
Experiment Setup Yes For our approach, we set p = 0.05 and s = 0.05 and did not try other values.