reproducibilityindex.ai

Towards Universal Paraphrastic Sentence Embeddings

Authors: John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare six compositional architectures, evaluating them on annotated textual similarity datasets drawn both from the same distribution as the training data and from a wide range of other domains. We find that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data. However, in out-of-domain scenarios, simple architectures such as word averaging vastly outperform LSTMs.
Researcher Affiliation	Academia	John Wieting Mohit Bansal Kevin Gimpel Karen Livescu Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA {jwieting,mbansal,kgimpel,klivescu}@ttic.edu
Pseudocode	No	The paper describes mathematical formulations of models but does not present any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Trained models and code for training and evaluation are available at http://ttic.uchicago.edu/~wieting.
Open Datasets	Yes	Our training data consists of (possibly noisy) pairs taken directly from the original Paraphrase Database (PPDB) and we optimize a margin-based loss.
Dataset Splits	Yes	However, for hyperparameter tuning we only used 100k examples sampled from PPDB XXL and trained for 5 epochs. Then after finding the hyperparameters that maximize Spearman’s ρ on the Pavlick et al. PPDB task, we trained on the entire XL section of PPDB for 10 epochs.
Hardware Specification	No	The paper states: 'We would also like to thank the developers of Theano (Bergstra et al., 2010; Bastien et al., 2012) and thank NVIDIA Corporation for donating GPUs used in this research.' However, it does not specify the model or type of GPUs or any other hardware components used for the experiments.
Software Dependencies	No	The paper mentions using Theano and refers to optimizers like Ada Grad and Adam, and toolkits such as Stanford Core NLP and NLTK, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Our models have the following tunable hyperparameters: λc, the L2 regularizer on the compositional parameters Wc (not applicable for the word averaging model), the pool of phrases used to obtain negative examples (coupled with mini-batch size B, to reduce the number of tunable hyperparameters), λw, the regularizer on the word embeddings, and δ, the margin. We also tune over optimization method (either Ada Grad (Duchi et al., 2011) or Adam (Kingma & Ba, 2014)), learning rate (from {0.05, 0.005, 0.0005}), whether to clip the gradients with threshold 1 (Pascanu et al., 2012), and whether to use MIX or MAX sampling.