Skip-Thought Vectors
Authors: Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, Sanja Fidler
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. |
| Researcher Affiliation | Academia | University of Toronto 1 Canadian Institute for Advanced Research 2 Massachusetts Institute of Technology 3 |
| Pseudocode | No | The paper includes mathematical equations for the encoder, decoder, and objective function, but no formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper refers to 'The publically available CBOW word vectors are used for this purpose 2. http://code.google.com/p/word2vec/'. This is a third-party tool used by the authors, not the open-source code for the methodology described in this paper. |
| Open Datasets | Yes | We chose to use a large collection of novels, namely the Book Corpus dataset [9] for training our models. ... Our first experiment is on the Sem Eval 2014 Task 1: semantic relatedness SICK dataset [30]. ... The next task we consider is paraphrase detection on the Microsoft Research Paraphrase Corpus [31]. ... For this experiment, we use the Microsoft COCO dataset [35]. ... We use 5 datasets: movie review sentiment (MR), customer product reviews (CR), subjectivity/objectivity classification (SUBJ), opinion polarity (MPQA) and question-type classification (TREC). |
| Dataset Splits | Yes | The dataset comes with a predefined split of 4500 training pairs, 500 development pairs and 4927 testing pairs. ... 10-fold cross-validation is used for evaluation on the first 4 datasets, while TREC has a pre-defined train/test split. We tune the L2 penality using cross-validation (and thus use a nested cross-validation for the first 4 datasets). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper. It only mentions 'Mini-batches of size 128 are used' and that models were trained for 'roughly two weeks', without specifying the underlying hardware. |
| Software Dependencies | No | The paper mentions 'Adam algorithm [17]' for optimization and 'CBOW word vectors' from word2vec (linking to http://code.google.com/p/word2vec/), but does not provide specific version numbers for any software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | Mini-batches of size 128 are used and gradients are clipped if the norm of the parameter vector exceeds 10. We used the Adam algorithm [17] for optimization. Both models were trained for roughly two weeks. ... To represent a sentence pair, we use two features. Given two skip-thought vectors u and v, we compute their component-wise product u v and their absolute difference |u v| and concatenate them together. ... In our experiments, we use a 1000 dimensional embedding, margin α = 0.2 and k = 50 contrastive terms. We trained for 15 epochs and saved our model anytime the performance improved on the development set. |