reproducibilityindex.ai

Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient

Authors: Yewen Li, Chaojie Wang, Zhibin Duan, Dongsheng Wang, Bo Chen, Bo An, Mingyuan Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our developed methods can effectively alleviate posterior collapse in deep topic models, contributing to providing higher-quality latent document representations.
Researcher Affiliation	Academia	1Nanyang Technological University 2Xidian University 3The University of Texas at Austin
Pseudocode	No	The main body of the paper does not contain structured pseudocode or algorithm blocks. While Appendix C is mentioned as containing details for the PG-based training algorithm, Appendix C is not provided.
Open Source Code	Yes	The implementation is available at https://github.com/yewen99/dc-ETM.
Open Datasets	Yes	Four widely used document benchmarks, specifically R8 [39], 20Newsgroups (20News) [40], Reuters Corpus Volume I (RCV1) [41] and World Wide Web Knowledge Base (Web KB) [42] are included in the following experiments.
Dataset Splits	No	The paper mentions using standard document benchmarks and discusses training settings like mini-batch size, but does not explicitly provide the specific training/validation/test dataset splits needed for reproduction in the main text.
Hardware Specification	Yes	All experiments are performed with an Nvidia RTX 3090 GPU and implemented with Py Torch [44].
Software Dependencies	No	The paper states that experiments are 'implemented with Py Torch [44]' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup	Yes	To make a fair comparison, we set the same network structure for all deep topic models as [256, 128, 64, 32, 16] from shallow to deep. For PTMs, we use the default hyperparameter settings in their published papers and accelerate the Gibbs sampling with GPU. For NTMs, we set the size of their hidden layers as 256, the embedding size as 100 for them incorporating word embeddings, like ETM, Saw ETM and dc-ETMs, and the mini-batch size as 200. For optimization, we adopt the same Adam optimizer [43] with a learning rate of 1e-2.