reproducibilityindex.ai

CogLTX: Applying BERT to Long Texts

Authors: Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. Our experiments demonstrate that Cog LTX outperforms or achieves comparable performance with the state-of-the-art results on four tasks, including News QA [44], Hotpot QA [53], 20News Groups [22] and Alibaba, with constant memory consumption regardless of the length of text.
Researcher Affiliation	Collaboration	Ming Ding Tsinghua University dm18@mails.tsinghua.edu.cn Chang Zhou Alibaba Group ericzhou.zc@alibaba-inc.com Hongxia Yang Alibaba Group yang.yhx@alibaba-inc.com Jie Tang Tsinghua University jietang@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1: The Training Algorithm of Cog LTX
Open Source Code	Yes	1Codes are available at https://github.com/Sleepychord/Cog LTX.
Open Datasets	Yes	We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. News QA [44]: A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman. Newsqa: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191 200, 2017. Hotpot QA [53]: Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369 2380, 2018. 20News Groups [22]: K. Lang. Newsweeder: Learning to ﬁlter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331 339, 1995.
Dataset Splits	Yes	Table 2: Results on Hotpot QA distractor (dev).
Hardware Specification	Yes	The data about memory are measured with batch size = 1 on a Tesla V100.
Software Dependencies	No	The paper mentions using Adam [18] for finetuning but does not specify version numbers for any key software components, libraries, or programming languages.
Experiment Setup	Yes	In all experiments, the judge and reasoner are ﬁnetuned by Adam [18] with learning rate 4 10 5 and 10 4 respectively. The learning rates warmup over the ﬁrst 10% steps, and then linearly decay to 1/10 of the max learning rates. The common hyperparameters are batch size = 32, strides= [3, 5], tup = 0.2 and tdown = 0.05.