reproducibilityindex.ai

Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages

Authors: Michelle Yuan, Benjamin Van Durme, Jordan L. Ying

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our algorithms on labeled English, Chinese, and Sinhalese documents. Within minutes, our methods can produce interpretable topics that are useful for speciﬁc classiﬁcation tasks. and We run experiments to evaluate three methods: multilingual anchoring, MTAnchor, and MCTA (Multilingual Cultural-common Topic Analysis) [33].
Researcher Affiliation	Academia	Michelle Yuan University of Maryland myuan@cs.umd.edu Benjamin Van Durme John Hopkins University vandurme@jhu.edu Jordan Boyd-Graber University of Maryland jbg@umiacs.umd.edu
Pseudocode	No	The paper describes algorithms like 'Recover L2' and 'Fast Anchor Words' conceptually but does not include structured pseudocode or algorithm blocks clearly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	2http://github.com/forest-snow/mtanchor_demo.
Open Datasets	Yes	The ﬁrst dataset consists of Wikipedia articles: 11,043 in English and 10,135 in Chinese. and Another dataset consists of Amazon reviews: 53,558 in English and 53,160 in Chinese (mostly from Taiwan) [30]. and To test low-resource languages, we use data from the LORELEI Sinhalese language pack [31].
Dataset Splits	Yes	For the Wikipedia and Amazon datasets, the training-test split is set to 80:20. and During the user study, we hold out 100 documents as a development set for each corpus.
Hardware Specification	Yes	All methods are implemented in Python on a 2.3 GHz Intel Core i5 processor.
Software Dependencies	No	The paper mentions 'implemented in Python' and uses tools like 'Word Net Lemmatizer [28]', 'Stanford Core NLP [29]', and 'LIBLINEAR [27]' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We train models on multilingual anchoring and MCTA with twenty topics. and To infer the topic distribution of documents, we pass in the topic matrices as inputs into variational inference [18], where topic variational parameter β is ﬁxed and only document variational parameter γ is ﬁtted. Then, we train a linear SVM on the topic distributions of documents [27] to classify document labels.