Contrastive Learning for Neural Topic Model
Authors: Thong Nguyen, Anh Tuan Luu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets that belong to various domains, vocabulary sizes, and document lengths in terms of topic coherence. |
| Researcher Affiliation | Collaboration | Thong Nguyen Vin AI Research v.thongnt66@vinai.io Luu Anh Tuan Nanyang Technological University anhtuan.luu@ntu.edu.sg |
| Pseudocode | Yes | Algorithm 1 Approximate β and Algorithm 2 Contrastive Neural Topic Model |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We conduct our experiments on three readily available datasets that belong to various domains, vocabulary sizes, and document lengths: 20Newsgroups (20NG) dataset [51], Wikitext-103 (Wiki) [53], IMDb movie reviews (IMDb) [54] |
| Dataset Splits | Yes | We conduct the dataset split with 48%, 12%, 40% for training, validation, and testing, respectively. (20NG); use the train/dev/test split of 70%, 15%, and 15%. (Wiki); Respectively, we apply the dataset split of 50%, 25%, 25% for training, validation, and testing. (IMDb) |
| Hardware Specification | No | The paper does not contain specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the BERT model, but does not specify version numbers for any software dependencies required for replication. |
| Experiment Setup | Yes | We evaluate our methods both at K = 50 and K = 200. and Our model k = 15 and We collect the latent vectors inferred by neural topic models in K = 50 and train a Random Forest with the number of decision trees as 10 and the maximum depth as 8 to predict the class of each document. |