reproducibilityindex.ai

Skip-Thought Vectors

Authors: Ryan Kiros, Yukun Zhu, Russ R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, Sanja Fidler

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classiﬁcation and 4 benchmark sentiment and subjectivity datasets.
Researcher Affiliation	Academia	University of Toronto 1 Canadian Institute for Advanced Research 2 Massachusetts Institute of Technology 3
Pseudocode	No	The paper includes mathematical equations for the encoder, decoder, and objective function, but no formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper refers to 'The publically available CBOW word vectors are used for this purpose 2. http://code.google.com/p/word2vec/'. This is a third-party tool used by the authors, not the open-source code for the methodology described in this paper.
Open Datasets	Yes	We chose to use a large collection of novels, namely the Book Corpus dataset [9] for training our models. ... Our ﬁrst experiment is on the Sem Eval 2014 Task 1: semantic relatedness SICK dataset [30]. ... The next task we consider is paraphrase detection on the Microsoft Research Paraphrase Corpus [31]. ... For this experiment, we use the Microsoft COCO dataset [35]. ... We use 5 datasets: movie review sentiment (MR), customer product reviews (CR), subjectivity/objectivity classiﬁcation (SUBJ), opinion polarity (MPQA) and question-type classiﬁcation (TREC).
Dataset Splits	Yes	The dataset comes with a predeﬁned split of 4500 training pairs, 500 development pairs and 4927 testing pairs. ... 10-fold cross-validation is used for evaluation on the ﬁrst 4 datasets, while TREC has a pre-deﬁned train/test split. We tune the L2 penality using cross-validation (and thus use a nested cross-validation for the ﬁrst 4 datasets).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments are provided in the paper. It only mentions 'Mini-batches of size 128 are used' and that models were trained for 'roughly two weeks', without specifying the underlying hardware.
Software Dependencies	No	The paper mentions 'Adam algorithm [17]' for optimization and 'CBOW word vectors' from word2vec (linking to http://code.google.com/p/word2vec/), but does not provide specific version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup	Yes	Mini-batches of size 128 are used and gradients are clipped if the norm of the parameter vector exceeds 10. We used the Adam algorithm [17] for optimization. Both models were trained for roughly two weeks. ... To represent a sentence pair, we use two features. Given two skip-thought vectors u and v, we compute their component-wise product u v and their absolute difference \|u v\| and concatenate them together. ... In our experiments, we use a 1000 dimensional embedding, margin α = 0.2 and k = 50 contrastive terms. We trained for 15 epochs and saved our model anytime the performance improved on the development set.