A no-regret generalization of hierarchical softmax to extreme multi-label classification
Authors: Marek Wydmuch, Kalina Jasinska, Mikhail Kuznetsov, Róbert Busa-Fekete, Krzysztof Dembczynski
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also show that our implementation of PLTs, referred to as EXTREMETEXT (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system. |
| Researcher Affiliation | Collaboration | Marek Wydmuch Institute of Computing Science Poznan University of Technology, Poland Kalina Jasinska Institute of Computing Science Poznan University of Technology, Poland Mikhail Kuznetsov Yahoo! Research New York, USA Róbert Busa-Fekete Yahoo! Research New York, USA Krzysztof Dembczy nski Institute of Computing Science Poznan University of Technology, Poland |
| Pseudocode | Yes | In Appendix D we additionally present the pseudocode of training and predicting with PLTs. |
| Open Source Code | Yes | Implementation of XT is available at https://github.com/mwydmuch/extreme Text. |
| Open Datasets | Yes | we compare XT, the variant of PLTs discussed in the previous section, to the state-of-the-art algorithms on five benchmark datasets taken from XMLC repository,4 and their text equivalents, by courtesy of Liu et al. (2017). ... Address of the XMLC repository: http://manikvarma.org/downloads/XC/XMLRepository.html |
| Dataset Splits | No | The paper mentions "Ntrain" and "Ntest" but does not specify a dedicated validation split percentage or count for the datasets. While hyperparameters were tuned using grid search, the exact validation split is not detailed. |
| Hardware Specification | No | The paper states: "Computational experiments have been performed in Poznan Supercomputing and Networking Center." This is a general statement and does not include specific hardware details such as GPU/CPU models, memory, or detailed cloud instance types. |
| Software Dependencies | No | The paper mentions building upon FASTTEXT and using L2 regularization but does not specify software names with version numbers (e.g., Python, PyTorch, specific library versions). |
| Experiment Setup | Yes | The range of hyper-parameters of XT for grid search: 1. minimum number of words: {1, 2}, 2. learning rate: {0.05, 0.1, 0.25, 0.5, 0.75, 1.0}, 3. word vector size: {500}, 4. epoch: {5, 10, 15, 20}, 5. L2-regularization: {0.00001, 0.0001, 0.001, 0.01, 0.1}, 6. min-probability for nodes: {0.001, 0.01, 0.05}. |