Contrastive Learning for Neural Topic Model

Authors: Thong Nguyen, Anh Tuan Luu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets that belong to various domains, vocabulary sizes, and document lengths in terms of topic coherence.
Researcher Affiliation Collaboration Thong Nguyen Vin AI Research v.thongnt66@vinai.io Luu Anh Tuan Nanyang Technological University anhtuan.luu@ntu.edu.sg
Pseudocode Yes Algorithm 1 Approximate β and Algorithm 2 Contrastive Neural Topic Model
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct our experiments on three readily available datasets that belong to various domains, vocabulary sizes, and document lengths: 20Newsgroups (20NG) dataset [51], Wikitext-103 (Wiki) [53], IMDb movie reviews (IMDb) [54]
Dataset Splits Yes We conduct the dataset split with 48%, 12%, 40% for training, validation, and testing, respectively. (20NG); use the train/dev/test split of 70%, 15%, and 15%. (Wiki); Respectively, we apply the dataset split of 50%, 25%, 25% for training, validation, and testing. (IMDb)
Hardware Specification No The paper does not contain specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like the BERT model, but does not specify version numbers for any software dependencies required for replication.
Experiment Setup Yes We evaluate our methods both at K = 50 and K = 200. and Our model k = 15 and We collect the latent vectors inferred by neural topic models in K = 50 and train a Random Forest with the number of decision trees as 10 and the maximum depth as 8 to predict the class of each document.