reproducibilityindex.ai

Multi-Scale Self-Attention for Text Classification

Authors: Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang7847-7854

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and signiﬁcantly on small and moderate size datasets.
Researcher Affiliation	Collaboration	Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University AWS Shanghai AI Lab New York University Shanghai {qpguo16, xpqiu, pﬂiu14, xyxue}@fudan.edu.cn, zz@nyu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'We implement the MS-Trans with Pytorch1 and DGL(Wang et al. 2019)' with a footnote linking to 'https://pytorch.org'. This link is for a third-party library, not the authors' own source code for their methodology.
Open Datasets	Yes	We evaluate our model on 17 text classiﬁcation datasets, 3 sequence labeling datasets and 1 natural language inference dataset. All the statistics can be found in Tab-1. (Table 1 lists datasets such as SST (Socher et al. 2013), MTL-16 (Liu, Qiu, and Huang 2017), PTB POS (Marcus, Santorini, and Marcinkiewicz 1993), Co NLL03 (Sang and Meulder 2003), Co NLL2012 NER (Pradhan et al. 2012), SNLI (Bowman et al. 2015).)
Dataset Splits	Yes	Table 1: An overall of datasets and its hyper-parameters... Dataset Train Dev. Test \|V \| H DIM α head DIM (e.g., SST 8k 1k 2k, MTL-16 1400 200 400)
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments (e.g., CPU/GPU models, memory).
Software Dependencies	No	The paper mentions 'Pytorch' and 'DGL' but does not specify version numbers for these software dependencies.
Experiment Setup	Yes	Table 1: An overall of datasets and its hyper-parameters, H DIM, α, head DIM indicates the dimension of hidden states, the hyper-parameter for controlling the scale distribution, the dimension of each head, respectively. The optimizer is Adam (Kingma and Ba 2014) and the learning rate and dropout ratio are listed in the Appendix.