Diagnosing and Improving Topic Models by Analyzing Posterior Variability
Authors: Linzi Xing, Michael Paul
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimenting with latent Dirichlet allocation on two datasets, we propose ideas incorporating information about the posterior distributions at the topic level and at the word level. |
| Researcher Affiliation | Academia | Linzi Xing Department of Computer Science University of Colorado, Boulder, CO 80309 linzi.xing@colorado.edu Michael J. Paul Department of Information Science University of Colorado, Boulder, CO 80309 mpaul@colorado.edu |
| Pseudocode | No | The paper describes methods textually but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing its code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We experiment with two datasets. The News corpus contains 2,243 articles from the Associated Press. The Wiki corpus contains 10,000 articles from Wikipedia. |
| Dataset Splits | No | The paper describes running Gibbs samplers with burn-in periods and sample collection, but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model training and evaluation. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory specifications, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions methods and platforms used (e.g., LDA, Gibbs sampling, Amazon Mechanical Turk) but does not list specific software libraries or their version numbers required for reproduction. |
| Experiment Setup | Yes | We set the number of topics to 50 for News and 100 for Wiki. We ran the Gibbs samplers for a burn-in period of 1,000 iterations, during which we also optimized the hyperparameters of the Dirichlet priors, before freezing the hyperparameters and collecting 100 samples, each separated by a 10-sample lag, running for a total of 2,000 iterations. |