CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning
Authors: Wissam Siblini, Pascale Kuntz, Frank Meyer
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches. |
| Researcher Affiliation | Collaboration | 1Computer Science Laboratory of Nantes (LS2N), France 2Orange Labs Lannion, France. |
| Pseudocode | Yes | Algorithm 1 train Tree Input: Training set with a feature matrix X and a label matrix Y. Initialize node v v.is Leaf test Stop Condition(X, Y ) if v.is Leaf = false then v.classif train Node Classifier(X, Y ) (Xchildi, Ychildi)i=0,..,k 1 split(v.classif, X, Y ) for i from 0 to k 1 do v.childi train Tree(Xchildi, Ychildi) end for else v.by compute Mean Label Vector(Y ) end if Output: node v |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of CRAFTML. It only mentions using 'Murmur Hash3' as an external component. |
| Open Datasets | Yes | The numerical experiments are carried on nine XML datasets from different application domains: Bibtex, Mediamill, Delicious, EURLex-4K, Wiki10-31K, Delicious200K, Amazon Cat-13K, Wiki LSHTC-325K, Amazon670K. The instance, feature and label cardinalities are reported in the first column of Table 2 and additional details are available in the XML repository2: http://manikvarma.org/downloads/XC/ XMLRepository.html |
| Dataset Splits | Yes | To avoid test set overfitting here, we restrict ourselves to the training part of the datasets: a validation set with twenty percent of the instances is used for evaluation. |
| Hardware Specification | No | The paper mentions running experiments on 'a five-core machine' but does not provide specific details such as CPU model, clock speed, GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Java for CRAFTML' and 'Murmur Hash3' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | To limit size effects in the experimental comparisons, the chosen number of trees and stop condition are the same as for Fast XML: m F = 50 and nleaf = 10 (Prabhu & Varma, 2014). As shown in Section 4.1, the label projection dimension d y does not impact the time and memory complexities and it has consequently been fixed to an arbitrary high value: d y = min(dy, 10000). The feature projection dimension d x has also no effect on time and a very limited one on memory in practice. CRAFTML reaches the plateau of performances for each dataset for a sample size ns = 20000 and a dimension d x = min(dx, 10000). CRAFTML already reaches its best performances with only i = 2 iterations in the k-means. |