reproducibilityindex.ai

Efficient Vector Representation for Documents through Corruption

Authors: Minmin Chen

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Doc2Vec C on a sentiment analysis task, a document classiﬁcation task and a semantic relatedness task, along with several document representation learning algorithms.
Researcher Affiliation	Industry	Minmin Chen Criteo Research Palo Alto, CA 94301, USA m.chen@criteo.com
Pseudocode	No	The paper provides mathematical derivations and descriptions but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All experiments can be reproduced using the code available at https://github.com/mchen24/iclr2017
Open Datasets	Yes	For sentiment analysis, we use the IMDB movie review dataset. It comes with predeﬁned train/test split (Maas et al., 2011)... We test Doc2Vec C on the Sem Eval 2014 Task 1: semantic relatedness SICK dataset (Marelli et al., 2014).
Dataset Splits	Yes	The hyper-parameters are tuned on a validation set subsampled from the training set. ... The set is splitted into a training set of 4,500 instances, a validation set of 500, and a test set of 4,927.
Hardware Specification	Yes	The experiments were conducted on a desktop with Intel i7 2.2Ghz cpu.
Software Dependencies	No	The paper mentions using a 'linear support vector machine (SVM)' and 't-SNE' for analysis but does not provide specific version numbers for any software libraries or tools.
Experiment Setup	Yes	We remove words that appear less than 10 times in the training set... A vector of 4800 dimensions... are generated for each document. In comparison, all the other algorithms produce a vector representation of size 100. ...we used q = 0.9 throughout the experiments. ... We used a cutoff of 100 in this experiment. ...we applied the trick of subsampling of frequent words introduced in (Mikolov & Dean, 2013)... Given the sentence embeddings, we used the exact same training and testing protocol as in (Kiros et al., 2015)...