Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Paraphrase Generation with Latent Bag of Words

Authors: Yao Fu, Yansong Feng, John P. Cunningham

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the transparent and effective generation process of this model.1
Researcher Affiliation	Academia	Yao Fu Department of Computer Science Columbia University EMAIL Yansong Feng Institute of Computer Science and Technology Peking University EMAIL John P. Cunningham Department of Statistics Columbia University EMAIL
Pseudocode	No	The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Our code can be found at https://github.com/Franx Yao/dgm_latent_bow
Open Datasets	Yes	Following the settings in previous works [26, 15], we use the Quora6 dataset and the MSCOCO[28] dataset for our experiments. ... For the Quora dataset, there are 50K training instances and 20K testing instances, and the vocabulary size is 8K. For the MSCOCO dataset, there are 94K training instances and 23K testing instances, and the vocabulary size is 11K.
Dataset Splits	No	The paper only explicitly mentions 'training instances' and 'testing instances' with specific counts, but does not provide details on a validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions LSTMs[18] and Adam [23] as components but does not specify version numbers for any software or libraries.
Experiment Setup	Yes	We set the maximum sentence length for the two datasets to be 16. ... The Seq2seq-Attn model is trained with 500 state size and 2 stacked LSTM layers. ... Experiments are repeated three times with different random seeds. The average performance is reported. More conﬁguration details are listed in the appendix.