Efficient Vector Representation for Documents through Corruption

Authors: Minmin Chen

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Doc2Vec C on a sentiment analysis task, a document classification task and a semantic relatedness task, along with several document representation learning algorithms.
Researcher Affiliation Industry Minmin Chen Criteo Research Palo Alto, CA 94301, USA m.chen@criteo.com
Pseudocode No The paper provides mathematical derivations and descriptions but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes All experiments can be reproduced using the code available at https://github.com/mchen24/iclr2017
Open Datasets Yes For sentiment analysis, we use the IMDB movie review dataset. It comes with predefined train/test split (Maas et al., 2011)... We test Doc2Vec C on the Sem Eval 2014 Task 1: semantic relatedness SICK dataset (Marelli et al., 2014).
Dataset Splits Yes The hyper-parameters are tuned on a validation set subsampled from the training set. ... The set is splitted into a training set of 4,500 instances, a validation set of 500, and a test set of 4,927.
Hardware Specification Yes The experiments were conducted on a desktop with Intel i7 2.2Ghz cpu.
Software Dependencies No The paper mentions using a 'linear support vector machine (SVM)' and 't-SNE' for analysis but does not provide specific version numbers for any software libraries or tools.
Experiment Setup Yes We remove words that appear less than 10 times in the training set... A vector of 4800 dimensions... are generated for each document. In comparison, all the other algorithms produce a vector representation of size 100. ...we used q = 0.9 throughout the experiments. ... We used a cutoff of 100 in this experiment. ...we applied the trick of subsampling of frequent words introduced in (Mikolov & Dean, 2013)... Given the sentence embeddings, we used the exact same training and testing protocol as in (Kiros et al., 2015)...