Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding

Authors: Xiaolong Wang, Jingjing Wang, Chengxiang Zhai

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental studies on text classification and word embedding learning demonstrate that DCME effectively strikes a balance between training speed and model quality, substantially outperforming state-of-the-art methods. We conduct experiments on tasks of text classification and word embedding, evaluating the proposed DCME approach by examining its computational and learning efficiency.
Researcher Affiliation Academia University of Illinois Urbana, IL 61801 {xwang95, jwang112, czhai}@illinois.edu
Pseudocode Yes Algorithm 1: DCME algorithm
Open Source Code Yes Our code is implemented in C and available for download at: https://github.com/dragonxlwang/dcme
Open Datasets Yes A public dataset ACM Digital Library is investigated. It has 162, 460 papers published at 1, 236 conferences. We hold out 10% of the documents for testing. Each paper is represented by the word count features of the top 30, 000 frequent words. (Footnote 1: http://dl.acm.org/). For the word embedding task, we explore the New York Times (NYT) corpus from the English Gigaword (Fifth Edition). It has a total of 1.35 billion words with 10.84 million unique terms. (Footnote 2: https://catalog.ldc.upenn.edu/LDC2011T07).
Dataset Splits No We hold out 10% of the documents for testing. To assess the performance, a randomly sampled 1 × 10−4 of the text is withheld for testing.
Hardware Specification Yes All the algorithms are run with 20 threads in parallel on a 64-bit Linux machine with the Intel Xeon 3.60GHz CPU (20 core).
Software Dependencies No Our code is implemented in C and available for download at: https://github.com/dragonxlwang/dcme
Experiment Setup Yes In order for DCME and the sampling-based approaches to have comparable training speed, we set both the cluster number K of DCME and the sampling number of NCE and NS to 20, and also control the interval between offline updates in DCME with β = 1. Two variants of DCME, DCME-Q0 and DCME-Q10, are developed, the latter of which applies the online/offline tuning with Q = 10. All the algorithms are run with 20 threads in parallel on a 64-bit Linux machine with the Intel Xeon 3.60GHz CPU (20 core). We train the word embeddings using CBOW with a context window size of 10 and embedding dimensionality of 100. We evaluate the trained word embeddings after 15 epochs.