reproducibilityindex.ai

ACT: an Attentive Convolutional Transformer for Efficient Text Classification

Authors: Pengfei Li, Peixiang Zhong, Kezhi Mao, Dongzhe Wang, Xuefeng Yang, Yunfeng Liu, Jianxiong Yin, Simon See13261-13269

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various text classiﬁcation tasks and detailed analyses show that ACT is a lightweight, fast, and effective universal text classiﬁer, outperforming CNNs, RNNs, and attentive models including Transformer.
Researcher Affiliation	Collaboration	Pengfei Li,1 Peixiang Zhong,1 Kezhi Mao,1* Dongzhe Wang,2 Xuefeng Yang,2 Yunfeng Liu,2 Jianxiong Yin,3 Simon See3 1 Nanyang Technological University, Singapore 2 Zhui Yi Technology, Shenzhen, China, 3 NVIDIA AI Tech Center
Pseudocode	No	The paper includes figures illustrating the architecture and mathematical equations but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the ACT model.
Open Datasets	Yes	We use six widely-studied datasets to evaluate our model, two for each text classiﬁcation task. These datasets are diverse in the aspects of type, size, number of classes, and document length. Table 1 shows the statistics of the datasets. For sentiment analysis, we use two datasets constructed by Zhang et al. (2015)... For topic categorization, we use AG s News (AGNews) and DBPedia datasets created by Zhang et al. (2015)... For relation extraction, we use TACRED and Sem Eval2010-task8 (Sem Eval) datasets...
Dataset Splits	Yes	For sentiment analysis and topic categorization, we set aside 10% of training data as the development set to tune model hyperparameters.
Hardware Specification	Yes	We report the average time needed to compute a single batch (batch size of 100) of Yelp F. dataset using NVIDIA Tesla P40 GPU with Intel Xeon E5-2667 CPU.
Software Dependencies	No	The paper mentions several techniques and components like 'Glove word embeddings', 'Dropout regularization', 'Ge LUs', and 'center loss', but it does not specify any software names with version numbers (e.g., PyTorch version, Python version, etc.) that would allow for reproducible setup.
Experiment Setup	Yes	In our experiments, word embedding matrix Wwrd is initialized with 300-d Glove word embeddings (Pennington, Socher, and Manning 2014). The fully connected layer before softmax has a dimension of 100. Dropout regularization (Srivastava et al. 2014) with a rate of 0.4 is applied during training. The weight and learning rate for center loss are 0.001 and 0.1 respectively. The models are trained using SGD with initial learning rate of 0.01 and momentum of 0.9. Learning rate is decayed with a rate of 0.9 after 10 epochs if the score on the development set does not improve. Batch size is set to 100 and the model is trained for 70 epochs. The dimensions of global attention and position embedding are 200 and 60 respectively. We use Ge LUs (Hendrycks and Gimpel 2016) for all the nonlinear activation functions.