Merging Statistical Feature via Adaptive Gate for Improved Text Classification
Authors: Xianming Li, Zongxi Li, Haoran Xie, Qing Li13288-13296
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on datasets of various scales show that, by incorporating statistical information, AGN can improve the classification performance of CNN, RNN, Transformer, and Bert based models effectively. |
| Researcher Affiliation | Collaboration | Xianming Li, 1 Ant Group, Shanghai, China 2 Department of Computer Science, City University of Hong Kong, Hong Kong SAR 3 Department of Computing and Decision Sciences, Lingnan University, Hong Kong SAR 4 Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | Code available at https://github.com/4AI/AGN |
| Open Datasets | Yes | We test the proposed model on the following datasets (with summary statistics in Table 2). Subj3 (Pang and Lee 2004) is a dataset of subjectivity. SST-14 (Socher et al. 2013) is the Stanford Sentiment Treebank dataset... TREC5 (Li and Roth 2002)... AG s News6 (Zhang, Zhao, and Le Cun 2015)... Yelp Review Full (Yelp F.)7 |
| Dataset Splits | Yes | Subj (Pang and Lee 2004)... We deploy 10-fold cross-validation on the dataset without standard train/test split (i.e., Subj). For datasets with standard split, we run ten trials and report the average results. |
| Hardware Specification | Yes | a CNN+AGN only requires 3, 250 additional parameters and 0.13 second more per epoch on training time, compared with a standard Text CNN (on SST-2 with an RTX 2080 Ti GPU). |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were explicitly stated. |
| Experiment Setup | Yes | The CNN-based models have a filter size of [3, 4, 5] with 100 filters of each, and the RNN-based models have hidden dimension of 128. For the Transformer, we use an encoder with 8 heads and 3 blocks. The employed Bert model is the Bert-base Uncased, including 12 layers, 768 hidden units, and 110M parameters. We adopt Adam optimizer with a batch size of 64 for non-Bert models and 16 for Bert models. The dropout rate is set to 0.5. |