reproducibilityindex.ai

Hierarchical Text Classification as Sub-hierarchy Sequence Generation

Authors: SangHun Im, GiBaeg Kim, Heung-Seon Oh, Seongung Jo, Dong Hwan Kim

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Hi DEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets, such as RCV1-v2, NYT, and EURLEX57K.
Researcher Affiliation	Academia	School of Computer Science and Engineering, Korea University of Technology and Education (KOREATECH) {tkrhkshdqn, fk0214, ohhs, oowhat, hwan6615}@koreatech.ac.kr
Pseudocode	Yes	Algorithm 1: Recursive Hierarchy Decoding in Inference
Open Source Code	Yes	1Code is available on https://github.com/SangHunIm/HiDEC
Open Datasets	Yes	For the standard evaluation, two small-scale datasets, RCV1-v2 (Lewis et al. 2004) and NYT (Sandhaus. 2008), and one large-scale dataset, EURLEX57K (Chalkidis et al. 2019), were chosen.
Dataset Splits	Yes	RCV1-v2 comprises 804,414 news documents, divided into 23,149 and 781,265 documents for training and testing, respectively, as benchmark splits. We randomly sampled 10% of the training data as the validation data for model selection. NYT comprises 36,471 news documents divided into 29,179 and 7,292 documents for training and testing, respectively. For a fair comparison, we followed the data configurations of previous work (Zhou et al. 2020; Chen et al. 2021). In particular, EURLEX57K is a large-scale hierarchy with 57,000 documents and 4,271 labels. Benchmark splits of 45,000, 6,000, and 6,000 were used for training, validation, and testing, respectively.
Hardware Specification	Yes	All models were implemented using Py Torch (Paszke et al. 2019) and trained using NVIDIA A6000.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al. 2019)' for implementation but does not specify its version or other key software dependencies with their respective version numbers (e.g., Python version, CUDA version, other libraries).
Experiment Setup	Yes	The size of the hidden state was set to 300. The word embeddings in the text encoder were initialized using 300-dimensional Glo Ve (Pennington, Socher, and Manning 2014). For Hi DEC, a layer with two heads were used for both GRU-based encoder and BERT. The label and level embeddings with 300- and 768-dimension for the GRU-based encoder and BERT, respectively, were initialized using a normal distribution with µ=0 and σ=300 0.5. The hidden state size in the attentive layer was the same as the label embedding size. The FFN comprised two FC layers with 600- and 3,072-dimension feed-forward filter for the GRU-based encoder and BERT, respectively. The threshold for recursive hierarchy decoding was set to 0.5. A dropout with a probability of 0.5, 0.1, and 0.1 was applied to the embedding layer and behind every FFN and attention, respectively. For optimization, Adam optimizer (Kingma and Ba 2015) was utilized with learning rate lr=1e-4, β1=0.9, β2=0.999, and eps=1e-8. The size of the mini-batch was set to 256 for GRU-based models. With BERT as a text encoder model, we set lr and the mini-batch size to 5e-5 and 64, respectively. The lr was controlled using a linear schedule with a warmup rate of 0.1. Gradient clipping with a maximum gradient norm of 1.0 was performed to prevent gradient overflow.