Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Contrastive Learning for Neural Topic Model
Authors: Thong Nguyen, Anh Tuan Luu
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets that belong to various domains, vocabulary sizes, and document lengths in terms of topic coherence. |
| Researcher Affiliation | Collaboration | Thong Nguyen Vin AI Research EMAIL Luu Anh Tuan Nanyang Technological University EMAIL |
| Pseudocode | Yes | Algorithm 1 Approximate β and Algorithm 2 Contrastive Neural Topic Model |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We conduct our experiments on three readily available datasets that belong to various domains, vocabulary sizes, and document lengths: 20Newsgroups (20NG) dataset [51], Wikitext-103 (Wiki) [53], IMDb movie reviews (IMDb) [54] |
| Dataset Splits | Yes | We conduct the dataset split with 48%, 12%, 40% for training, validation, and testing, respectively. (20NG); use the train/dev/test split of 70%, 15%, and 15%. (Wiki); Respectively, we apply the dataset split of 50%, 25%, 25% for training, validation, and testing. (IMDb) |
| Hardware Specification | No | The paper does not contain specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the BERT model, but does not specify version numbers for any software dependencies required for replication. |
| Experiment Setup | Yes | We evaluate our methods both at K = 50 and K = 200. and Our model k = 15 and We collect the latent vectors inferred by neural topic models in K = 50 and train a Random Forest with the number of decision trees as 10 and the maximum depth as 8 to predict the class of each document. |