Label Confusion Learning to Enhance Text Classification Models

Authors: Biyang Guo, Songqiao Han, Xiao Han, Hailiang Huang, Ting Lu12929-12936

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on five text classification benchmark datasets reveal the effectiveness of LCM for several widely used deep learning classification models. Further experiments also verify that LCM is especially helpful for confused or noisy datasets and superior to the label smoothing method.
Researcher Affiliation Academia AI Lab, School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai, China, 200433 guobiyang2020@gmail.com, {han.songqiao, xiaohan, hlhuang}@shufe.edu.cn, luting@189.cn
Pseudocode No The paper describes the model architecture and mathematical formulations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes The 20NG dataset1 (bydata version) is an English news dataset that contains 18846 documents evenly categorized into 20 different categories. 1https://www.cs.umb.edu/ smimarog/textmining/datasets/ The AG s News dataset2 is constructed by Xiang Zhang (Zhang, Zhao, and Le Cun 2015) which contains 127600 samples with 4 classes. 2http://www.di.unipi.it/ gulli The DBPedia dataset3 is also created by Xiang Zhang (Zhang, Zhao, and Le Cun 2015). 3http://dbpedia.org The FDCNews dataset4 is provided by Fudan University which contains 9833 Chinese news categorized into 20 different classes. 4http://www.nlpir.org The THUCNews dataset5 is a Chinese news classification dataset collected by Tsinghua University. 5http://thuctc.thunlp.org
Dataset Splits Yes Most of the datasets have already been split into train and test set. However the different split can directly affect the final performance of the model. Therefore, in our experiments, we combine the separated train and test set to one dataset and randomly split them to different train and test set 10 times by splitting ratio of 7:3.
Hardware Specification Yes The model is implemented using Keras and is trained on GPU Ge Force GTX 1070 Ti.
Software Dependencies No The paper states 'The model is implemented using Keras' but does not provide specific version numbers for Keras or any other software dependencies.
Experiment Setup Yes Settings For LSTM we set embedding size and hidden size as 64. For CNN, we use 3 filters with size 3, 10 and 25 and the number of filters for each convolution block is 100. For both LSTM and CNN models, the embedding size is 64 if no pre-trained word embedding are used. Otherwise, the embedding size is 250 for Chinese tasks and 100 for English tasks. ... In our main experiments we just set α = 4 as a moderate choice. ... We train our model s parameters with the Adam Optimizer (Kingma and Ba 2014) with an initial learning rate of 0.001 and batch size of 128.