reproducibilityindex.ai

Binary Partitions with Approximate Minimum Impurity

Authors: Eduardo Laber, Marco Molinaro, Felipe Mello Pereira

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction. To complement our theoretical ﬁndings, in Section 5 we present a set of experiments where we compare the proposed methods with PCext and SLIQext.
Researcher Affiliation	Academia	1Departamento de Inform atica, PUCRIO, Brazil. Correspondence to: Eduardo Laber <eduardo.laber1@gmail.com>.
Pseudocode	Yes	Algorithm 1 Bd (V : collection of vectors, I: impurity measure) 1: For each v in V let r(v) = (d v)/ v 1 2: Rank the vectors in V according to r(v) 3: for j = 1, . . . , n 1 do 4: Pj subset of V containing the j vectors with the largest value of r( ) 5: Evaluate the impurity of partition (Pj, V \ Pj) 6: end for 7: Return the partition (Pj , V \ Pj ) with the smallest impurity found in the loop
Open Source Code	Yes	The algorithms were implemented in Python 3 using numpy and and are available in https://github.com/felipeamp/icml-2018.
Open Datasets	Yes	We employed 11 datasets in total. Eight of them are from the UCI repository: Mushroom, KDD98, Adult, Nursery, Covertype, Cars, Contraceptive and Poker (Lichman, 2013). Two others are available in Kaggle: San Francisco Crime and Shelter Animal Outcome (SF-Open Data; Austin Animal-Center).
Dataset Splits	Yes	We employed a 95% one-tailed paired t-student test to compare the accuracy attained by the methods over 20 3-fold stratiﬁed cross-validations.
Hardware Specification	Yes	All the experiments were executed on a PC Intel i7-6500U CPU with 2.5GHz and 8GB of RAM.
Software Dependencies	No	The algorithms were implemented in Python 3 using numpy and and are available in https://github.com/felipeamp/icml-2018. While Python 3 is mentioned, a specific version for numpy is not provided.
Experiment Setup	Yes	All experiments are Monte Carlo simulations with 10,000 runs, each using a randomly-generated contingency table for the given number of values n and classes k. Each table was created by uniformly picking a number in {0, . . . , 7} for each entry. We build decision trees with depth at most 16.