reproducibilityindex.ai

Topic Modeling Revisited: A Document Graph-based Neural Network Perspective

Authors: Dazhong Shen, Chuan Qin, Chao Wang, Zheng Dong, Hengshu Zhu, Hui Xiong

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, extensive experiments on four benchmark datasets have clearly demonstrated the effectiveness and interpretability of GNTM compared with state-of-the-art baselines.
Researcher Affiliation	Collaboration	Dazhong Shen1,2, Chuan Qin2, Chao Wang1,2, Zheng Dong2, Hengshu Zhu2, , Hui Xiong3, 1School of Computer Science and Technology, University of Science and Technology of China 2Baidu Talent Intelligence Center, Baidu Inc. 3Artiﬁcial Intelligence Thrust, The Hong Kong University of Science and Technology
Pseudocode	No	The paper does not contain explicit pseudocode or labeled algorithm blocks. Figures 1(b) and 2 are diagrams, not pseudocode.
Open Source Code	Yes	Our code and data are available at https://github.com/SmilesDZgk/GNTM.
Open Datasets	Yes	Our experiments are conducted on four benchmark datasets with varying sizes, including 20 News Groups (20NG) [27], Tag My News (TMN) [51], the British National Corpus (BNC) [10], Reuters extracted from the Reuters-21578 dataset. The statistics and links of these datasets are shown in Appendix A.4.
Dataset Splits	No	The paper uses benchmark datasets but does not explicitly provide the train/validation/test split percentages, sample counts, or refer to specific predefined splits with details for reproduction. It mentions evaluating on 'labeled datasets' for clustering but no split details for this either.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'GloVe word embeddings' but does not provide specific version numbers for any software dependencies needed to replicate the experiment.
Experiment Setup	Yes	In practice, we construct document graphs with window size s = 53. We set the Dirichlet prior parameter α = 1. We utilized 300-dimensional Glo Ve word embeddings [41] to ﬁx X (i.e., L = 300) in our model and word vectors in ETM. We also set the size of word vectors µv as 300, i.e., H = 300. The size of transitional vector ak and bk were set as Y = 64. For the optimization, Adam [23] optimizer has been used with the initial learning rate of 0.001 and the linear learning rate decay trick to ﬁnd the optimal.