Learning semantic similarity in a continuous space

Authors: Michel Deudon

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is evaluated on Quora duplicate questions dataset and performs strongly. ... Table 1 compares different models for this task on the Quora dataset.
Researcher Affiliation Academia Michel Deudon Ecole Polytechnique Palaiseau, France michel.deudon@polytechnique.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is made publicly available on github.2 [https://github.com/MichelDeudon/variational-siamese-network]
Open Datasets Yes We evaluated our proposed framework on Quora question pairs dataset which consists of 404k sentence pairs annotated with a binary value that indicates whether a pair is duplicate (same intent) or not.1 [https://www.kaggle.com/quora/question-pairs-dataset]
Dataset Splits No The paper mentions a 'Dev set' in Table 1 and Figure 5, implying a validation split. However, it does not provide specific details on the size, percentage, or methodology for this split, only stating 'The split considered is that of Bi MPM [47]' without further elaboration on the split details themselves.
Hardware Specification Yes For a given query, our model runs 3500+ comparisons per second on two Tesla K80. ... All queries were retrieved in less than a second on two Tesla K80 GPU.
Software Dependencies Yes We implemented our model using python 3.5.4, tensorflow 1.3.0 [42], gensim 3.0.1 [43] and nltk 3.2.4 [44].
Experiment Setup Yes Our variational space (µ, σ) is of dimension h = 1000. Our bi-LSTM encoder network consists of a single layer of 2h neurons and our LSTM [31] decoder has a single layer with 1000 neurons. Our MLP s inner layer has 1000 neurons. All weights were randomly intialized with 'Xavier' initializer [45]. ... We employ stochastic gradient descent with ADAM optimizer [46] (lr = 0.001, β1 = 0.9, β2 = 0.999) and batches of size 256 and 128. Our learning rate is initialized for both task to 0.001, decayed every 5000 step by a factor 0.96 with an exponential scheme. We clip the L2 norm of our gradients to 1.0 to avoid exploding gradients in deep neural networks.