reproducibilityindex.ai

Sawtooth Factorial Topic Embeddings Guided Gamma Belief Network

Authors: Zhibin Duan, Dongsheng Wang, Bo Chen, Chaojie Wang, Wenchao Chen, Yewen Li, Jie Ren, Mingyuan Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on big corpora show that our models outperform other neural topic models on extracting deeper interpretable topics and deriving better document representations.
Researcher Affiliation	Academia	1National Laboratory of Radar Signal Processing, Xidian University, Xi an, China. 2Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712, USA.
Pseudocode	Yes	Algorithm 1 Upward-Downwar Autoencoding Variational Inference for Saw ETM
Open Source Code	Yes	Our code is available at https://github.com/Bo Chen Group/ Saw ETM.
Open Datasets	Yes	We run our experiments on four widely used benchmark corpora including R8, 20Newsgroups (20NG), Reuters Corpus Volume I (RCV1), and PG-19.
Dataset Splits	No	For each corpus, we randomly select 80% of the word token from each document to form a training matrix T, holding out the remaining 20% to form a testing matrix Y. ... R8 is partitioned into a training set of 5, 485 ones and a testing set of 2, 189 ones. ... 20NG is split into a training set of 11, 314 ones and a testing set of 7, 532 one. While train and test splits are defined, a distinct validation set is not explicitly mentioned with its proportion or count.
Hardware Specification	Yes	All experiments are performed on Nvidia GTX 8000 GPU and coded with Py Torch.
Software Dependencies	No	All experiments are performed on Nvidia GTX 8000 GPU and coded with Py Torch. ... For optimization, the Adam optimizer (Kingma & Ba, 2014) is utilized... The paper mentions PyTorch and Adam optimizer but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	For the hierarchical topic models, the network structures of 15-layer models are set as [256, 224, 192, 160, 128, 112, 96, 80, 64, 56, 48, 40, 32, 16, 8]. For the embedding-based topic models such as ETM, DETM, and Saw ETM, we set the embedding size as 100. For the NTMs, we set the hidden size as 256. For optimization, the Adam optimizer (Kingma & Ba, 2014) is utilized with a learning rate of 1e 2. The mini-batch size is set as 200 in all experiments.