Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages
Authors: Michelle Yuan, Benjamin Van Durme, Jordan L. Ying
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our algorithms on labeled English, Chinese, and Sinhalese documents. Within minutes, our methods can produce interpretable topics that are useful for specific classification tasks. and We run experiments to evaluate three methods: multilingual anchoring, MTAnchor, and MCTA (Multilingual Cultural-common Topic Analysis) [33]. |
| Researcher Affiliation | Academia | Michelle Yuan University of Maryland myuan@cs.umd.edu Benjamin Van Durme John Hopkins University vandurme@jhu.edu Jordan Boyd-Graber University of Maryland jbg@umiacs.umd.edu |
| Pseudocode | No | The paper describes algorithms like 'Recover L2' and 'Fast Anchor Words' conceptually but does not include structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | 2http://github.com/forest-snow/mtanchor_demo. |
| Open Datasets | Yes | The first dataset consists of Wikipedia articles: 11,043 in English and 10,135 in Chinese. and Another dataset consists of Amazon reviews: 53,558 in English and 53,160 in Chinese (mostly from Taiwan) [30]. and To test low-resource languages, we use data from the LORELEI Sinhalese language pack [31]. |
| Dataset Splits | Yes | For the Wikipedia and Amazon datasets, the training-test split is set to 80:20. and During the user study, we hold out 100 documents as a development set for each corpus. |
| Hardware Specification | Yes | All methods are implemented in Python on a 2.3 GHz Intel Core i5 processor. |
| Software Dependencies | No | The paper mentions 'implemented in Python' and uses tools like 'Word Net Lemmatizer [28]', 'Stanford Core NLP [29]', and 'LIBLINEAR [27]' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We train models on multilingual anchoring and MCTA with twenty topics. and To infer the topic distribution of documents, we pass in the topic matrices as inputs into variational inference [18], where topic variational parameter β is fixed and only document variational parameter γ is fitted. Then, we train a linear SVM on the topic distributions of documents [27] to classify document labels. |