Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Topic Modeling as Multi-Objective Contrastive Optimization
Authors: Thong Thanh Nguyen, Xiaobao Wu, Xinshuai Dong, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance. |
| Researcher Affiliation | Academia | 1Institute of Data Science (IDS), National University of Singapore (NUS), Singapore, 2Nanyang Technological University (NTU), Singapore, 3Carnegie Mellon University (CMU), USA, |
| Pseudocode | Yes | Algorithm 1 Setwise Contrastive Neural Topic Model as Multi-Objective Optimization. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We adopt popular benchmark datasets spanning various domains, vocabulary sizes, and document lengths for experiments: (i) 20Newsgroups (20NG) (Lang, 1995); (ii) IMDb (Maas et al., 2011); (iii) Wikitext-103 (Wiki) (Merity et al., 2016); (iv) AG News (Zhang et al., 2015), |
| Dataset Splits | No | For AG News, the paper states 'whose size is 30000 and 1900 for training and testing subsets, respectively', indicating train/test splits. However, it does not explicitly state validation splits for any of the datasets used in the main experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions) used for the experiments. |
| Experiment Setup | Yes | In Table 8, we denote hyperparameter details of our neural topic models, i.e. learning rate η, batch size B, and the temperature τ for the Info NCE loss. For training execution, the hyperparameters vary with respect to the dataset. Table 8: Hyperparameter Settings for Neural Topic Model Training. Hyperparameter 20NG IMDb Wiki T = 50 T = 200 T = 50 T = 200 T = 50 T = 200 sample set size K 4 4 3 3 4 4 permutation matrix size P 8 8 8 8 8 8 temperature τ 0.2 0.2 0.2 0.2 0.2 0.2 learning rate η 0.002 0.002 0.002 0.002 0.001 0.002 batch size B 200 200 200 200 500 500 |