reproducibilityindex.ai

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

Authors: Xianming Li, Zongxi Li, Haoran Xie, Qing Li13288-13296

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on datasets of various scales show that, by incorporating statistical information, AGN can improve the classiﬁcation performance of CNN, RNN, Transformer, and Bert based models effectively.
Researcher Affiliation	Collaboration	Xianming Li, 1 Ant Group, Shanghai, China 2 Department of Computer Science, City University of Hong Kong, Hong Kong SAR 3 Department of Computing and Decision Sciences, Lingnan University, Hong Kong SAR 4 Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	Code available at https://github.com/4AI/AGN
Open Datasets	Yes	We test the proposed model on the following datasets (with summary statistics in Table 2). Subj3 (Pang and Lee 2004) is a dataset of subjectivity. SST-14 (Socher et al. 2013) is the Stanford Sentiment Treebank dataset... TREC5 (Li and Roth 2002)... AG s News6 (Zhang, Zhao, and Le Cun 2015)... Yelp Review Full (Yelp F.)7
Dataset Splits	Yes	Subj (Pang and Lee 2004)... We deploy 10-fold cross-validation on the dataset without standard train/test split (i.e., Subj). For datasets with standard split, we run ten trials and report the average results.
Hardware Specification	Yes	a CNN+AGN only requires 3, 250 additional parameters and 0.13 second more per epoch on training time, compared with a standard Text CNN (on SST-2 with an RTX 2080 Ti GPU).
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were explicitly stated.
Experiment Setup	Yes	The CNN-based models have a ﬁlter size of [3, 4, 5] with 100 ﬁlters of each, and the RNN-based models have hidden dimension of 128. For the Transformer, we use an encoder with 8 heads and 3 blocks. The employed Bert model is the Bert-base Uncased, including 12 layers, 768 hidden units, and 110M parameters. We adopt Adam optimizer with a batch size of 64 for non-Bert models and 16 for Bert models. The dropout rate is set to 0.5.