Siamese Recurrent Architectures for Learning Sentence Similarity
Authors: Jonas Mueller, Aditya Thyagarajan
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014). |
| Researcher Affiliation | Academia | Jonas Mueller Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology Aditya Thyagarajan Department of Computer Science and Engineering M. S. Ramaiah Institute of Technology |
| Pseudocode | No | The paper provides mathematical equations for LSTM updates but does not include any formally labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references 'word2vec embeddings' as 'Publicly available at: code.google.com/p/word2vec' but does not provide a statement or link for the open-source code of their own proposed Manhattan LSTM model or methodology. |
| Open Datasets | Yes | The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014). We use the 300-dimensional word2vec embeddings1 which Mikolov et al. (2013) demonstrate can capture intricate inter-word relationships such as vec(king) vec(man) + vec(woman) vec(queen). 1Publicly available at: code.google.com/p/word2vec |
| Dataset Splits | Yes | The SICK data contains 9927 sentence pairs with a 5,000/4,927 training/test split (Marelli et al. 2014). We employ early-stopping based on a validation set containing 30% of the training examples. |
| Hardware Specification | No | The paper discusses the training process and optimization methods but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimization methods like 'Adadelta' and uses 'word2vec embeddings' but does not specify any software libraries or dependencies with version numbers that would be required to reproduce the experiments. |
| Experiment Setup | Yes | Our LSTM uses 50-dimensional hidden representations ht and memory cells ct. Optimization of the parameters is done using the Adadelta method of Zeiler (2012) along with gradient clipping (rescaling gradients whose norm exceeds a threshold) to avoid the exploding gradients problem (Pascanu, Mikolov, and Bengio 2013). We employ early-stopping based on a validation set containing 30% of the training examples. We first initialize our LSTM weights with small random Gaussian entries (and a separate large value of 2.5 for the forget gate bias to facilitate modeling of long range dependence). |