Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient

Authors: Yewen Li, Chaojie Wang, Zhibin Duan, Dongsheng Wang, Bo Chen, Bo An, Mingyuan Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our developed methods can effectively alleviate posterior collapse in deep topic models, contributing to providing higher-quality latent document representations.
Researcher Affiliation Academia 1Nanyang Technological University 2Xidian University 3The University of Texas at Austin
Pseudocode No The main body of the paper does not contain structured pseudocode or algorithm blocks. While Appendix C is mentioned as containing details for the PG-based training algorithm, Appendix C is not provided.
Open Source Code Yes The implementation is available at https://github.com/yewen99/dc-ETM.
Open Datasets Yes Four widely used document benchmarks, specifically R8 [39], 20Newsgroups (20News) [40], Reuters Corpus Volume I (RCV1) [41] and World Wide Web Knowledge Base (Web KB) [42] are included in the following experiments.
Dataset Splits No The paper mentions using standard document benchmarks and discusses training settings like mini-batch size, but does not explicitly provide the specific training/validation/test dataset splits needed for reproduction in the main text.
Hardware Specification Yes All experiments are performed with an Nvidia RTX 3090 GPU and implemented with Py Torch [44].
Software Dependencies No The paper states that experiments are 'implemented with Py Torch [44]' but does not provide a specific version number for PyTorch or other software dependencies.
Experiment Setup Yes To make a fair comparison, we set the same network structure for all deep topic models as [256, 128, 64, 32, 16] from shallow to deep. For PTMs, we use the default hyperparameter settings in their published papers and accelerate the Gibbs sampling with GPU. For NTMs, we set the size of their hidden layers as 256, the embedding size as 100 for them incorporating word embeddings, like ETM, Saw ETM and dc-ETMs, and the mini-batch size as 200. For optimization, we adopt the same Adam optimizer [43] with a learning rate of 1e-2.