Topic Modeling on Health Journals With Regularized Variational Inference
Authors: Robert Giaquinto, Arindam Banerjee
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show significant improvements over competing topic models particularly after regularization, and highlight the DAP model s unique ability to capture common journeys shared by different authors. ... Section 5 introduces the evaluation dataset and procedure. Section 6 shares the results of the experiments. |
| Researcher Affiliation | Academia | Robert Giaquinto, Arindam Banerjee Dept of Computer Science & Engineering University of Minnesota, Twin Cities {smit7982@umn.edu, banerjee@cs.umn.edu} |
| Pseudocode | No | No structured pseudocode or algorithm blocks are provided. The generative process of the model is described in prose. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The Caring Bridge (CB) dataset is mentioned: 'The full dataset includes 13.1 million journals written by approximately half a million authors between 2006 and 2016. From the CB dataset we draw an evaluation dataset consisting of journals written by authors who posted, on average, at least twice a month over a one year period.' However, no link, DOI, repository, or citation for public access to this specific dataset is provided. |
| Dataset Splits | No | The paper states: 'Journals are split into training and test sets with 90% of each author s journals (N = 103, 018) for training and 10% (N = 11, 728) for testing.' It also mentions '10-fold cross validation'. However, a separate validation split (e.g., for hyperparameter tuning) is not explicitly mentioned. |
| Hardware Specification | No | The paper mentions 'University of Minnesota Supercomputing Institute (MSI) for technical support' in the acknowledgments, but no specific hardware details such as GPU/CPU models, processor types, or memory used for experiments are provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library names with versions) are mentioned. |
| Experiment Setup | Yes | Following the approach of others, we simply fix the number of topics at 25 for all models. The number of personas learned by the DAP model is fixed at 15. |