Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding
Authors: Xiaolong Wang, Jingjing Wang, Chengxiang Zhai
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental studies on text classification and word embedding learning demonstrate that DCME effectively strikes a balance between training speed and model quality, substantially outperforming state-of-the-art methods. We conduct experiments on tasks of text classification and word embedding, evaluating the proposed DCME approach by examining its computational and learning efficiency. |
| Researcher Affiliation | Academia | University of Illinois Urbana, IL 61801 {xwang95, jwang112, czhai}@illinois.edu |
| Pseudocode | Yes | Algorithm 1: DCME algorithm |
| Open Source Code | Yes | Our code is implemented in C and available for download at: https://github.com/dragonxlwang/dcme |
| Open Datasets | Yes | A public dataset ACM Digital Library is investigated. It has 162, 460 papers published at 1, 236 conferences. We hold out 10% of the documents for testing. Each paper is represented by the word count features of the top 30, 000 frequent words. (Footnote 1: http://dl.acm.org/). For the word embedding task, we explore the New York Times (NYT) corpus from the English Gigaword (Fifth Edition). It has a total of 1.35 billion words with 10.84 million unique terms. (Footnote 2: https://catalog.ldc.upenn.edu/LDC2011T07). |
| Dataset Splits | No | We hold out 10% of the documents for testing. To assess the performance, a randomly sampled 1 × 10−4 of the text is withheld for testing. |
| Hardware Specification | Yes | All the algorithms are run with 20 threads in parallel on a 64-bit Linux machine with the Intel Xeon 3.60GHz CPU (20 core). |
| Software Dependencies | No | Our code is implemented in C and available for download at: https://github.com/dragonxlwang/dcme |
| Experiment Setup | Yes | In order for DCME and the sampling-based approaches to have comparable training speed, we set both the cluster number K of DCME and the sampling number of NCE and NS to 20, and also control the interval between offline updates in DCME with β = 1. Two variants of DCME, DCME-Q0 and DCME-Q10, are developed, the latter of which applies the online/offline tuning with Q = 10. All the algorithms are run with 20 threads in parallel on a 64-bit Linux machine with the Intel Xeon 3.60GHz CPU (20 core). We train the word embeddings using CBOW with a context window size of 10 and embedding dimensionality of 100. We evaluate the trained word embeddings after 15 epochs. |