Does Head Label Help for Long-Tailed Multi-Label Text Classification

Authors: Lin Xiao, Xiangliang Zhang, Liping Jing, Chi Huang, Mingyang Song14103-14111

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on three benchmark datasets demonstrate that HTTN consistently outperforms the stateof-the-art methods.
Researcher Affiliation Academia 1 Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China 2 King Abdullah University of Science and Technology(KAUST), Saudi Arabia
Pseudocode No The paper describes the proposed method in detail with text and figures, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and hyper-parameter settings are released for reproducibility1. 1https://github.com/xiaolin1207/HTTN-master
Open Datasets Yes Three multi-label text datasets are used to evaluate the HTTN model, AAPD, RCV1 and EUR-Lex. Their label distributions all follow the power-low distribution, as shown in Figure 1. These datasets have defined the training and testing split. We follow the same data usage for all evaluated models.
Dataset Splits No The paper states, 'These datasets have defined the training and testing split. We follow the same data usage for all evaluated models.' However, it does not provide specific details on a validation split (e.g., percentages or counts) or how it was used beyond implying predefined splits.
Hardware Specification No The paper does not specify the exact hardware components (e.g., GPU/CPU models, memory) used for running the experiments, only general statements about parameters and training.
Software Dependencies No The paper mentions using Glove for word embeddings and Adam for optimization, but it does not list specific software dependencies with their version numbers (e.g., deep learning frameworks like PyTorch or TensorFlow, or other libraries).
Experiment Setup Yes Parameter Setting: For all three datasets, we use Glove (Pennington, Socher, and Manning 2014) to get the word embedding in 300-dim. LSTM hidden state dimension k is set to 300. The parameter d = 128 for W2 and Wtransfer. The number of sampled instances t for label prototyper in AAPD, RCV1 and EUR-Lex are t = 5, 5, 1, respectively. The whole model is trained via Adam (Kingma and Ba 2014) with the learning rate being 0.001. AAPD and RCV1 have 54 and 103 labels, respectively. To test the performance on different number of tail labels, we set ltail=18 and 9 in AAPD, and ltail=28 and 14 in RCV1. For EUR-Lex dataset, we select the last 768 one-shot tail labels and the 1238 less than three-shot tail labels. For the ensemble HTTN, we set G = 30 for AAPD and RCV1. G = 1 for EUR-Lex, because there are many one-shot labels in EUR-Lex. We used the default parameters for the DXML, XML-CNN, EXAM, and LTMCP models. The baselines OLTR, Imprinting, and BBN deal with the long tail problem on the image recognition, the feature extractor used was the Res Net-10, Res Net-32 and others. For a fair comparison, we replace the feature extractor with Bi-LSTM with attention. The parameters of all baselines are either adopted from their original papers or determined by experiments.