reproducibilityindex.ai

Poolingformer: Long Document Modeling with Pooling Attention

Authors: Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual Ty Di QA. Experimental results show that Poolingformer sits atop three ofﬁcial leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on Ty Di QA passage answer, and 1.6 points (67.6 vs. 66.0) on Ty Di QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the ar Xiv benchmark continue to demonstrate its superior performance.
Researcher Affiliation	Collaboration	1College of Computer Science, Sichuan University 2During Internship at MSRA 3Microsoft Research Asia 4Microsoft Azure AI 5University of Science and Technology of China.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of its source code.
Open Datasets	Yes	For QA, we report the results on the monolingual Natural Question (NQ) and the multilingual Ty Di QA. For long document summarization, we report the results on the ar Xiv dataset (Cohan et al., 2018). Natural Questions: This dataset collected real questions in Google s search engine. Each question is paired with a Wikipedia page. [...] https://ai.google.com/research/ Natural Questions/dataset. Ty Di QA: Ty Di QA is a multilingual question answering dataset [...] https://ai.google.com/research/tydiqa. ar Xiv: ar Xiv (Cohan et al., 2018) is a long document summarization dataset collected from scientiﬁc repositories arxiv.org.
Dataset Splits	Yes	For NQ and Ty Di QA , We split documents into multiple spans with a sliding window approach (Alberti et al., 2019). The size and stride of the sliding window are set to 4,096 and 1,568, respectively. Each instance is formed by a start placeholder, a question, and a document span. The question and the document span are separated by a special placeholder. Since many instances contain no answer, the number of negative instances and positive instances is imbalanced. We follow Liu et al. (2020) to sub-sample negative instances during training. The ratio of the sub-sampling set to 0.5.
Hardware Specification	Yes	For all experiments, we use 8 NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions 'Huggingface Transformers (Wolf et al., 2020) and Fairseq (Ott et al., 2019)' and 'Apex3' but does not specify exact version numbers for these software dependencies.
Experiment Setup	Yes	The window sizes of the ﬁrst-level and second-level is set to 128 and 512, respectively. The pooling kernel size, stride size are set to 5, 4. We use Adam optimizer (Kingma & Ba, 2015) with linear learning rate decay. The batch size, the training epoch, the learning rate, and the learning rate warmup proportion are set to 64, 2, 2 10 5 and 0.1 respectively.