Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages

Authors: Michelle Yuan, Benjamin Van Durme, Jordan L. Ying

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our algorithms on labeled English, Chinese, and Sinhalese documents. Within minutes, our methods can produce interpretable topics that are useful for specific classification tasks. and We run experiments to evaluate three methods: multilingual anchoring, MTAnchor, and MCTA (Multilingual Cultural-common Topic Analysis) [33].
Researcher Affiliation Academia Michelle Yuan University of Maryland myuan@cs.umd.edu Benjamin Van Durme John Hopkins University vandurme@jhu.edu Jordan Boyd-Graber University of Maryland jbg@umiacs.umd.edu
Pseudocode No The paper describes algorithms like 'Recover L2' and 'Fast Anchor Words' conceptually but does not include structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes 2http://github.com/forest-snow/mtanchor_demo.
Open Datasets Yes The first dataset consists of Wikipedia articles: 11,043 in English and 10,135 in Chinese. and Another dataset consists of Amazon reviews: 53,558 in English and 53,160 in Chinese (mostly from Taiwan) [30]. and To test low-resource languages, we use data from the LORELEI Sinhalese language pack [31].
Dataset Splits Yes For the Wikipedia and Amazon datasets, the training-test split is set to 80:20. and During the user study, we hold out 100 documents as a development set for each corpus.
Hardware Specification Yes All methods are implemented in Python on a 2.3 GHz Intel Core i5 processor.
Software Dependencies No The paper mentions 'implemented in Python' and uses tools like 'Word Net Lemmatizer [28]', 'Stanford Core NLP [29]', and 'LIBLINEAR [27]' but does not provide specific version numbers for these software components.
Experiment Setup Yes We train models on multilingual anchoring and MCTA with twenty topics. and To infer the topic distribution of documents, we pass in the topic matrices as inputs into variational inference [18], where topic variational parameter β is fixed and only document variational parameter γ is fitted. Then, we train a linear SVM on the topic distributions of documents [27] to classify document labels.