Learning semantic similarity in a continuous space
Authors: Michel Deudon
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach is evaluated on Quora duplicate questions dataset and performs strongly. ... Table 1 compares different models for this task on the Quora dataset. |
| Researcher Affiliation | Academia | Michel Deudon Ecole Polytechnique Palaiseau, France michel.deudon@polytechnique.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is made publicly available on github.2 [https://github.com/MichelDeudon/variational-siamese-network] |
| Open Datasets | Yes | We evaluated our proposed framework on Quora question pairs dataset which consists of 404k sentence pairs annotated with a binary value that indicates whether a pair is duplicate (same intent) or not.1 [https://www.kaggle.com/quora/question-pairs-dataset] |
| Dataset Splits | No | The paper mentions a 'Dev set' in Table 1 and Figure 5, implying a validation split. However, it does not provide specific details on the size, percentage, or methodology for this split, only stating 'The split considered is that of Bi MPM [47]' without further elaboration on the split details themselves. |
| Hardware Specification | Yes | For a given query, our model runs 3500+ comparisons per second on two Tesla K80. ... All queries were retrieved in less than a second on two Tesla K80 GPU. |
| Software Dependencies | Yes | We implemented our model using python 3.5.4, tensorflow 1.3.0 [42], gensim 3.0.1 [43] and nltk 3.2.4 [44]. |
| Experiment Setup | Yes | Our variational space (µ, σ) is of dimension h = 1000. Our bi-LSTM encoder network consists of a single layer of 2h neurons and our LSTM [31] decoder has a single layer with 1000 neurons. Our MLP s inner layer has 1000 neurons. All weights were randomly intialized with 'Xavier' initializer [45]. ... We employ stochastic gradient descent with ADAM optimizer [46] (lr = 0.001, β1 = 0.9, β2 = 0.999) and batches of size 256 and 128. Our learning rate is initialized for both task to 0.001, decayed every 5000 step by a factor 0.96 with an exponential scheme. We clip the L2 norm of our gradients to 1.0 to avoid exploding gradients in deep neural networks. |