reproducibilityindex.ai

CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning

Authors: Wissam Siblini, Pascale Kuntz, Frank Meyer

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches.
Researcher Affiliation	Collaboration	1Computer Science Laboratory of Nantes (LS2N), France 2Orange Labs Lannion, France.
Pseudocode	Yes	Algorithm 1 train Tree Input: Training set with a feature matrix X and a label matrix Y. Initialize node v v.is Leaf test Stop Condition(X, Y ) if v.is Leaf = false then v.classif train Node Classiﬁer(X, Y ) (Xchildi, Ychildi)i=0,..,k 1 split(v.classif, X, Y ) for i from 0 to k 1 do v.childi train Tree(Xchildi, Ychildi) end for else v.by compute Mean Label Vector(Y ) end if Output: node v
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of CRAFTML. It only mentions using 'Murmur Hash3' as an external component.
Open Datasets	Yes	The numerical experiments are carried on nine XML datasets from different application domains: Bibtex, Mediamill, Delicious, EURLex-4K, Wiki10-31K, Delicious200K, Amazon Cat-13K, Wiki LSHTC-325K, Amazon670K. The instance, feature and label cardinalities are reported in the ﬁrst column of Table 2 and additional details are available in the XML repository2: http://manikvarma.org/downloads/XC/ XMLRepository.html
Dataset Splits	Yes	To avoid test set overﬁtting here, we restrict ourselves to the training part of the datasets: a validation set with twenty percent of the instances is used for evaluation.
Hardware Specification	No	The paper mentions running experiments on 'a ﬁve-core machine' but does not provide specific details such as CPU model, clock speed, GPU models, or memory specifications.
Software Dependencies	No	The paper mentions 'Java for CRAFTML' and 'Murmur Hash3' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	To limit size effects in the experimental comparisons, the chosen number of trees and stop condition are the same as for Fast XML: m F = 50 and nleaf = 10 (Prabhu & Varma, 2014). As shown in Section 4.1, the label projection dimension d y does not impact the time and memory complexities and it has consequently been ﬁxed to an arbitrary high value: d y = min(dy, 10000). The feature projection dimension d x has also no effect on time and a very limited one on memory in practice. CRAFTML reaches the plateau of performances for each dataset for a sample size ns = 20000 and a dimension d x = min(dx, 10000). CRAFTML already reaches its best performances with only i = 2 iterations in the k-means.