Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CogLTX: Applying BERT to Long Texts
Authors: Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. Our experiments demonstrate that Cog LTX outperforms or achieves comparable performance with the state-of-the-art results on four tasks, including News QA [44], Hotpot QA [53], 20News Groups [22] and Alibaba, with constant memory consumption regardless of the length of text. |
| Researcher Affiliation | Collaboration | Ming Ding Tsinghua University EMAIL Chang Zhou Alibaba Group EMAIL Hongxia Yang Alibaba Group EMAIL Jie Tang Tsinghua University EMAIL |
| Pseudocode | Yes | Algorithm 1: The Training Algorithm of Cog LTX |
| Open Source Code | Yes | 1Codes are available at https://github.com/Sleepychord/Cog LTX. |
| Open Datasets | Yes | We conducted experiments on four long-text datasets with different tasks. The boxplot in Figure 5 illustrates the statistics of the text length in the datasets. News QA [44]: A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman. Newsqa: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP, pages 191 200, 2017. Hotpot QA [53]: Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369 2380, 2018. 20News Groups [22]: K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331 339, 1995. |
| Dataset Splits | Yes | Table 2: Results on Hotpot QA distractor (dev). |
| Hardware Specification | Yes | The data about memory are measured with batch size = 1 on a Tesla V100. |
| Software Dependencies | No | The paper mentions using Adam [18] for finetuning but does not specify version numbers for any key software components, libraries, or programming languages. |
| Experiment Setup | Yes | In all experiments, the judge and reasoner are finetuned by Adam [18] with learning rate 4 10 5 and 10 4 respectively. The learning rates warmup over the first 10% steps, and then linearly decay to 1/10 of the max learning rates. The common hyperparameters are batch size = 32, strides= [3, 5], tup = 0.2 and tdown = 0.05. |