Skip-Thought Vectors

Authors: Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, Sanja Fidler

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets.
Researcher Affiliation Academia University of Toronto 1 Canadian Institute for Advanced Research 2 Massachusetts Institute of Technology 3
Pseudocode No The paper includes mathematical equations for the encoder, decoder, and objective function, but no formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper refers to 'The publically available CBOW word vectors are used for this purpose 2. http://code.google.com/p/word2vec/'. This is a third-party tool used by the authors, not the open-source code for the methodology described in this paper.
Open Datasets Yes We chose to use a large collection of novels, namely the Book Corpus dataset [9] for training our models. ... Our first experiment is on the Sem Eval 2014 Task 1: semantic relatedness SICK dataset [30]. ... The next task we consider is paraphrase detection on the Microsoft Research Paraphrase Corpus [31]. ... For this experiment, we use the Microsoft COCO dataset [35]. ... We use 5 datasets: movie review sentiment (MR), customer product reviews (CR), subjectivity/objectivity classification (SUBJ), opinion polarity (MPQA) and question-type classification (TREC).
Dataset Splits Yes The dataset comes with a predefined split of 4500 training pairs, 500 development pairs and 4927 testing pairs. ... 10-fold cross-validation is used for evaluation on the first 4 datasets, while TREC has a pre-defined train/test split. We tune the L2 penality using cross-validation (and thus use a nested cross-validation for the first 4 datasets).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper. It only mentions 'Mini-batches of size 128 are used' and that models were trained for 'roughly two weeks', without specifying the underlying hardware.
Software Dependencies No The paper mentions 'Adam algorithm [17]' for optimization and 'CBOW word vectors' from word2vec (linking to http://code.google.com/p/word2vec/), but does not provide specific version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup Yes Mini-batches of size 128 are used and gradients are clipped if the norm of the parameter vector exceeds 10. We used the Adam algorithm [17] for optimization. Both models were trained for roughly two weeks. ... To represent a sentence pair, we use two features. Given two skip-thought vectors u and v, we compute their component-wise product u v and their absolute difference |u v| and concatenate them together. ... In our experiments, we use a 1000 dimensional embedding, margin α = 0.2 and k = 50 contrastive terms. We trained for 15 epochs and saved our model anytime the performance improved on the development set.