reproducibilityindex.ai

Does Tail Label Help for Large-Scale Multi-Label Learning

Authors: Tong Wei, Yu-Feng Li

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments clearly verify that both the prediction time and the model size are signiﬁcantly reduced without sacriﬁcing much predictive performance for state-of-the-art approaches.
Researcher Affiliation	Academia	Tong Wei and Yu-Feng Li National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology and Industrialization Nanjing 210023, China {weit, liyf}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1 ADATTL Input: feature vectors X RN d; label vectors Y RN L; hyper-parameters α, β and sample size T Output: the fraction λ of tail labels 1: for t = 1, 2, , T do 2: trim a randomly selected fraction λτt of tail labels resulting in the remaining label set Lτt 3: sample a subset of training examples Dτt randomly 4: train large-scale multi-label model ft on (Dτt, Lτt) 5: compute perft, timet and sizet 6: end for 7: ﬁt {(\|Dτj\|, λτj, perfj)}T j=1, {(\|Dτj\|, λτj, timej)}T j=1 and {(\|Dτj\|, λτj, sizej)}T j=1 with polynomial surfaces 8: optimize Eq. (7) and obtain λ 9: return λ
Open Source Code	No	The paper states, 'All the datasets and implementation of LEML and Fast XML are publicly available and can be downloaded from the Extreme Classiﬁcation Repository1,' which refers to the baseline models. There is no explicit statement or link provided for the authors' own ADATTL method's source code.
Open Datasets	Yes	Experiments are carried out on multi-label datasets including Bibtex (159 labels), Delicious (983 labels), EUR-Lex (3993 labels) and Wiki10 (30K labels). All the datasets and implementation of LEML and Fast XML are publicly available and can be downloaded from the Extreme Classiﬁcation Repository1. 1http://manikvarma.org/downloads/XC/XMLRepository.html
Dataset Splits	No	The paper refers to 'training set size' and 'testing performance' but does not explicitly mention a validation set or provide specific train/validation/test splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, computational cluster specifications) used to run the experiments.
Software Dependencies	No	The paper mentions using LEML and Fast XML implementations but does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or scikit-learn).
Experiment Setup	Yes	We use quartic polynomial functions to model functions f, g and h. When modelling function f, we use the sum of P@k and n DCG@k, k = {1, 2, 3}, as the whole testing performance. Default value of parameters for LEML and Fast XML are used and hyper-parameters α and β in Eq. (7) are set to 1.