Dynamic Early Stopping for Naive Bayes
Authors: Aäron Verachtert, Hendrik Blockeel, Jesse Davis
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Evaluation The goal of our empirical evaluation is to answer the following questions: 1. For a fixed attribute budget k, how does the dynamic approach compare to the static one in terms of efficiency and accuracy? 2. How does our proposed approach compare to same-decision probability (SDP) [Choi et al., 2012] and costsensitive Naive Bayes (cs NB) [Chai et al., 2004]? These represent two other ways to dynamically make a prediction based on only a subset of the attributes. To answer the first question, we train two types of models for each attribute ordering. The baseline is a static model that always uses k attributes to make a prediction. Then we learn a dynamic model according to our proposed approach that can consider at most k attributes at prediction time. To explore a range of different operating conditions, we investigate various values of k. 5.1 Data Sets We perform an evaluation using seven data sets from various domains, summarized in Table 1. |
| Researcher Affiliation | Academia | A aron Verachtert, Hendrik Blockeel, and Jesse Davis Department of Computer Science KU Leuven Celestijnenlaan 200A 3001 Leuven, Belgium {aaron.verachtert, hendrik.blockeel, jesse.davis}@cs.kuleuven.be |
| Pseudocode | Yes | Algorithm 1 Prediction with Stop Points and Algorithm 2 Training Naive Bayes with Stop Points |
| Open Source Code | No | The paper states "Full experimental results are available in the online supplement at http://dtai.cs.kuleuven.be/software/dsnb" but does not explicitly mention the availability of source code for the methodology. |
| Open Datasets | Yes | We perform an evaluation using seven data sets from various domains, summarized in Table 1. ... As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004]... |
| Dataset Splits | Yes | Each data set is split into a training set, a validation set and a test set. As RCV1-v2 is usually treated as a temporal data set, we use the standard chronological split [Lewis et al., 2004], where the training set consists of the first 23,149 instances, the validation set consists of the next 23,149 instances, and the test set consists of the remaining instances. For the other data sets, we randomly select 40% of the examples for training, 20% for validation, and 40% for testing. |
| Hardware Specification | Yes | For efficiency, we begin by using a Raspberry Pi system (Model B+, 512MB, with Power Bank battery pack supplying 5V at a maximum of 1A) running Raspbian and a Java Virtual Machine, and measure energy consumption using a Tenma digital multimeter 727730A. |
| Software Dependencies | No | The paper mentions 'running Raspbian and a Java Virtual Machine' but does not provide specific version numbers for software components or libraries. |
| Experiment Setup | Yes | For our approach, we set p = 0.05 and s = 0.05 and did not try other values. |