From Word Embeddings To Document Distances
Authors: Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, we demonstrate on eight real world document classification data sets, in comparison with seven stateof-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates. |
| Researcher Affiliation | Academia | Matt J. Kusner MKUSNER@WUSTL.EDU Yu Sun YUSUN@WUSTL.EDU Nicholas I. Kolkin N.KOLKIN@WUSTL.EDU Kilian Q. Weinberger KILIAN@WUSTL.EDU Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63130 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Matlab code for the WMD metric is available at http:// matthewkusner.com |
| Open Datasets | Yes | We evaluate all approaches on 8 supervised document datasets: BBCSPORT: BBC sports articles between 20042005, TWITTER: a set of tweets labeled with sentiments positive , negative , or neutral (Sanders, 2011)... REUTERS: a classic news dataset labeled by news topics (we use the 8-class version with train/test split as described in Cardoso-Cachopo (2007))... and 20NEWS: news articles classified into 20 different categories (we use the bydate train/test split1 Cardoso Cachopo (2007)). |
| Dataset Splits | Yes | For all algorithms we split the train-ing set into a 80/20 train/validation for hyper-parameter tuning. |
| Hardware Specification | Yes | All speedups are reported relative to the time required for the exhaustive WMD metric (very top of the figure) and were run in paralell on 4 cores (8 cores for 20NEWS) of an Intel L5520 CPU with 2.27Ghz clock frequency. |
| Software Dependencies | No | The paper mentions 'Matlab code' and 'Matlab Topic Modeling Toolbox' but does not specify software versions for either. |
| Experiment Setup | Yes | For all algorithms we split the train-ing set into a 80/20 train/validation for hyper-parameter tuning. All free hyperparameters were set with Bayesian optimization for all algorithms (Snoek et al., 2012). ...WMD have no hyperparameters and thus we only optimize the neighborhood size (k {1, . . . , 19}) of k NN. |