Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Paraphrase Generation with Latent Bag of Words
Authors: Yao Fu, Yansong Feng, John P. Cunningham
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the transparent and effective generation process of this model.1 |
| Researcher Affiliation | Academia | Yao Fu Department of Computer Science Columbia University EMAIL Yansong Feng Institute of Computer Science and Technology Peking University EMAIL John P. Cunningham Department of Statistics Columbia University EMAIL |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Our code can be found at https://github.com/Franx Yao/dgm_latent_bow |
| Open Datasets | Yes | Following the settings in previous works [26, 15], we use the Quora6 dataset and the MSCOCO[28] dataset for our experiments. ... For the Quora dataset, there are 50K training instances and 20K testing instances, and the vocabulary size is 8K. For the MSCOCO dataset, there are 94K training instances and 23K testing instances, and the vocabulary size is 11K. |
| Dataset Splits | No | The paper only explicitly mentions 'training instances' and 'testing instances' with specific counts, but does not provide details on a validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions LSTMs[18] and Adam [23] as components but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | We set the maximum sentence length for the two datasets to be 16. ... The Seq2seq-Attn model is trained with 500 state size and 2 stacked LSTM layers. ... Experiments are repeated three times with different random seeds. The average performance is reported. More configuration details are listed in the appendix. |