CogLTX: Applying BERT to Long Texts
Authors: Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. Our experiments demonstrate that Cog LTX outperforms or achieves comparable performance with the state-of-the-art results on four tasks, including News QA [44], Hotpot QA [53], 20News Groups [22] and Alibaba, with constant memory consumption regardless of the length of text. |
| Researcher Affiliation | Collaboration | Ming Ding Tsinghua University dm18@mails.tsinghua.edu.cn Chang Zhou Alibaba Group ericzhou.zc@alibaba-inc.com Hongxia Yang Alibaba Group yang.yhx@alibaba-inc.com Jie Tang Tsinghua University jietang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1: The Training Algorithm of Cog LTX |
| Open Source Code | Yes | 1Codes are available at https://github.com/Sleepychord/Cog LTX. |
| Open Datasets | Yes | We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. News QA [44]: A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman. Newsqa: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191 200, 2017. Hotpot QA [53]: Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369 2380, 2018. 20News Groups [22]: K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331 339, 1995. |
| Dataset Splits | Yes | Table 2: Results on Hotpot QA distractor (dev). |
| Hardware Specification | Yes | The data about memory are measured with batch size = 1 on a Tesla V100. |
| Software Dependencies | No | The paper mentions using Adam [18] for finetuning but does not specify version numbers for any key software components, libraries, or programming languages. |
| Experiment Setup | Yes | In all experiments, the judge and reasoner are finetuned by Adam [18] with learning rate 4 10 5 and 10 4 respectively. The learning rates warmup over the first 10% steps, and then linearly decay to 1/10 of the max learning rates. The common hyperparameters are batch size = 32, strides= [3, 5], tup = 0.2 and tdown = 0.05. |