From Word Embeddings To Document Distances

Authors: Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, we demonstrate on eight real world document classification data sets, in comparison with seven stateof-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.
Researcher Affiliation Academia Matt J. Kusner MKUSNER@WUSTL.EDU Yu Sun YUSUN@WUSTL.EDU Nicholas I. Kolkin N.KOLKIN@WUSTL.EDU Kilian Q. Weinberger KILIAN@WUSTL.EDU Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63130
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Matlab code for the WMD metric is available at http:// matthewkusner.com
Open Datasets Yes We evaluate all approaches on 8 supervised document datasets: BBCSPORT: BBC sports articles between 20042005, TWITTER: a set of tweets labeled with sentiments positive , negative , or neutral (Sanders, 2011)... REUTERS: a classic news dataset labeled by news topics (we use the 8-class version with train/test split as described in Cardoso-Cachopo (2007))... and 20NEWS: news articles classified into 20 different categories (we use the bydate train/test split1 Cardoso Cachopo (2007)).
Dataset Splits Yes For all algorithms we split the train-ing set into a 80/20 train/validation for hyper-parameter tuning.
Hardware Specification Yes All speedups are reported relative to the time required for the exhaustive WMD metric (very top of the figure) and were run in paralell on 4 cores (8 cores for 20NEWS) of an Intel L5520 CPU with 2.27Ghz clock frequency.
Software Dependencies No The paper mentions 'Matlab code' and 'Matlab Topic Modeling Toolbox' but does not specify software versions for either.
Experiment Setup Yes For all algorithms we split the train-ing set into a 80/20 train/validation for hyper-parameter tuning. All free hyperparameters were set with Bayesian optimization for all algorithms (Snoek et al., 2012). ...WMD have no hyperparameters and thus we only optimize the neighborhood size (k {1, . . . , 19}) of k NN.