A no-regret generalization of hierarchical softmax to extreme multi-label classification

Authors: Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, Krzysztof Dembczynski

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also show that our implementation of PLTs, referred to as EXTREMETEXT (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.
Researcher Affiliation Collaboration Marek Wydmuch Institute of Computing Science Poznan University of Technology, Poland Kalina Jasinska Institute of Computing Science Poznan University of Technology, Poland Mikhail Kuznetsov Yahoo! Research New York, USA Róbert Busa-Fekete Yahoo! Research New York, USA Krzysztof Dembczy nski Institute of Computing Science Poznan University of Technology, Poland
Pseudocode Yes In Appendix D we additionally present the pseudocode of training and predicting with PLTs.
Open Source Code Yes Implementation of XT is available at https://github.com/mwydmuch/extreme Text.
Open Datasets Yes we compare XT, the variant of PLTs discussed in the previous section, to the state-of-the-art algorithms on five benchmark datasets taken from XMLC repository,4 and their text equivalents, by courtesy of Liu et al. (2017). ... Address of the XMLC repository: http://manikvarma.org/downloads/XC/XMLRepository.html
Dataset Splits No The paper mentions "Ntrain" and "Ntest" but does not specify a dedicated validation split percentage or count for the datasets. While hyperparameters were tuned using grid search, the exact validation split is not detailed.
Hardware Specification No The paper states: "Computational experiments have been performed in Poznan Supercomputing and Networking Center." This is a general statement and does not include specific hardware details such as GPU/CPU models, memory, or detailed cloud instance types.
Software Dependencies No The paper mentions building upon FASTTEXT and using L2 regularization but does not specify software names with version numbers (e.g., Python, PyTorch, specific library versions).
Experiment Setup Yes The range of hyper-parameters of XT for grid search: 1. minimum number of words: {1, 2}, 2. learning rate: {0.05, 0.1, 0.25, 0.5, 0.75, 1.0}, 3. word vector size: {500}, 4. epoch: {5, 10, 15, 20}, 5. L2-regularization: {0.00001, 0.0001, 0.001, 0.01, 0.1}, 6. min-probability for nodes: {0.001, 0.01, 0.05}.