Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient
Authors: Yewen Li, Chaojie Wang, Zhibin Duan, Dongsheng Wang, Bo Chen, Bo An, Mingyuan Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our developed methods can effectively alleviate posterior collapse in deep topic models, contributing to providing higher-quality latent document representations. |
| Researcher Affiliation | Academia | 1Nanyang Technological University 2Xidian University 3The University of Texas at Austin |
| Pseudocode | No | The main body of the paper does not contain structured pseudocode or algorithm blocks. While Appendix C is mentioned as containing details for the PG-based training algorithm, Appendix C is not provided. |
| Open Source Code | Yes | The implementation is available at https://github.com/yewen99/dc-ETM. |
| Open Datasets | Yes | Four widely used document benchmarks, specifically R8 [39], 20Newsgroups (20News) [40], Reuters Corpus Volume I (RCV1) [41] and World Wide Web Knowledge Base (Web KB) [42] are included in the following experiments. |
| Dataset Splits | No | The paper mentions using standard document benchmarks and discusses training settings like mini-batch size, but does not explicitly provide the specific training/validation/test dataset splits needed for reproduction in the main text. |
| Hardware Specification | Yes | All experiments are performed with an Nvidia RTX 3090 GPU and implemented with Py Torch [44]. |
| Software Dependencies | No | The paper states that experiments are 'implemented with Py Torch [44]' but does not provide a specific version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | To make a fair comparison, we set the same network structure for all deep topic models as [256, 128, 64, 32, 16] from shallow to deep. For PTMs, we use the default hyperparameter settings in their published papers and accelerate the Gibbs sampling with GPU. For NTMs, we set the size of their hidden layers as 256, the embedding size as 100 for them incorporating word embeddings, like ETM, Saw ETM and dc-ETMs, and the mini-batch size as 200. For optimization, we adopt the same Adam optimizer [43] with a learning rate of 1e-2. |