reproducibilityindex.ai

A Radical-Aware Attention-Based Model for Chinese Text Classification

Authors: Hanqing Tao, Shiwei Tong, Hongke Zhao, Tong Xu, Binbin Jin, Qi Liu5125-5132

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we conduct extensive experiments, where the experimental results not only show the superiority of our model, but also validate the effectiveness of radicals in the task of Chinese text classiﬁcation.
Researcher Affiliation	Academia	Anhui Province Key Laboratory of Big Data Analysis and Application, University of Science and Technology of China School of Data Science, University of Science and Technology of China {hqtao, tongsw, zhhk, bb0725}@mail.ustc.edu.cn, {tongxu, qiliuql}@ustc.edu.cn
Pseudocode	No	The paper includes a model diagram (Figure 2) but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a direct link to a code repository for the methodology described.
Open Datasets	Yes	To ﬁt the problems studied in this paper, we choose a public Chinese text dataset (Zhou et al. 2016) which is suitable for our work.
Dataset Splits	Yes	Dataset#1. It contains 47,952 Chinese news titles with 32 gold standard classiﬁcation labels for training and 15,986 titles for testing. ... Dataset#2. After the processing, we still have more than 75% of the raw data: 36,431 texts for training and 12,267 texts for testing.
Hardware Specification	No	The paper mentions 'several GPUs accelerating the experimental process' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions using 'jieba as the word segmentation tool', 'word2vec tool (Gensim)', and 'MXNet' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	In RAFG, we empirically set the dimension of hidden vectors of each BLSTM to 256. To avoid overﬁtting, when we get the embeddings of characters, words, character-level radicals and word-level radicals, we drop 50% of them. In addition, we have tried some learning rates and ﬁnally set the learning rate to 0.03... Furthermore, we set the batchsize to 32 and the epoch of training process to 200. Finally, we use Precision (P), Recall (R) and F1-measure (F1) to evaluate the performance (Hotho, N urnberger, and Paaß 2005; Qiao et al. 2019):