Siamese Recurrent Architectures for Learning Sentence Similarity

Authors: Jonas Mueller, Aditya Thyagarajan

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014).
Researcher Affiliation Academia Jonas Mueller Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Aditya Thyagarajan Department of Computer Science and Engineering M. S. Ramaiah Institute of Technology
Pseudocode No The paper provides mathematical equations for LSTM updates but does not include any formally labeled pseudocode or algorithm blocks.
Open Source Code No The paper references 'word2vec embeddings' as 'Publicly available at: code.google.com/p/word2vec' but does not provide a statement or link for the open-source code of their own proposed Manhattan LSTM model or methodology.
Open Datasets Yes The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014). We use the 300-dimensional word2vec embeddings1 which Mikolov et al. (2013) demonstrate can capture intricate inter-word relationships such as vec(king) vec(man) + vec(woman) vec(queen). 1Publicly available at: code.google.com/p/word2vec
Dataset Splits Yes The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014). We employ early-stopping based on a validation set containing 30% of the training examples.
Hardware Specification No The paper discusses the training process and optimization methods but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used for running the experiments.
Software Dependencies No The paper mentions optimization methods like 'Adadelta' and uses 'word2vec embeddings' but does not specify any software libraries or dependencies with version numbers that would be required to reproduce the experiments.
Experiment Setup Yes Our LSTM uses 50-dimensional hidden representations ht and memory cells ct. Optimization of the parameters is done using the Adadelta method of Zeiler (2012) along with gradient clipping (rescaling gradients whose norm exceeds a threshold) to avoid the exploding gradients problem (Pascanu, Mikolov, and Bengio 2013). We employ early-stopping based on a validation set containing 30% of the training examples. We first initialize our LSTM weights with small random Gaussian entries (and a separate large value of 2.5 for the forget gate bias to facilitate modeling of long range dependence).