Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multiway Attention Networks for Modeling Sentence Pairs

Authors: Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou

IJCAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that the proposed multiway attention networks improve the result on the Quora Question Pairs, SNLI, Multi NLI, and answer sentence selection task on the SQu AD dataset.
Researcher Affiliation	Collaboration	State Key Laboratory of Software Development Environment, Beihang University, China Microsoft Research, Beijing, China +Peking University, Beijing, China
Pseudocode	No	The paper includes diagrams and descriptions of the model architecture and processes, but it does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code, nor does it include a link to a code repository for the methodology described.
Open Datasets	Yes	Quora Question Pairs This dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other. SNLI It is a natural language inference dataset [Bowman et al., 2015]. Multi NLI It is a natural language inference dataset [Williams et al., 2017]. SQu AD It is a reading comprehension dataset, where the answer to each question is a span of text from the corresponding passage [Rajpurkar et al., 2016].
Dataset Splits	Yes	Quora Question Pairs... We select 5,000 paraphrases and 5,000 non-paraphrases as the development set, and use another 5,000 paraphrases and 5,000 non-paraphrases as the test set. We keep the remaining instances as the training set. SNLI... we have 549,367 pairs for training, 9,842 pairs for development and 9,824 pairs for test. Multi NLI... This dataset contains 392,702 pairs for training, 9,815 matched pairs and 9,832 mismatched pairs for development, 9,796 matched pairs and 9,847 mismatched pairs for test. SQu AD... we split the 10,570 instances in the development set to 5,000 for development and 5,570 for test.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments, such as specific GPU or CPU models, memory, or other detailed computing specifications.
Software Dependencies	No	The paper mentions using GloVe embeddings, a pre-trained language model (ELMo), GRU, Ada Delta, dropout, and the Stanford Core NLP Toolkit, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We use 300-dimensional uncased pre-trained Glo Ve embeddings without update during training. Hidden vector length is set to 150 for all layers. We apply dropout between layers, with dropout rate 0.2. The model is optimized using Ada Delta with initial learning rate of 1.0.