Topic Modeling as Multi-Objective Contrastive Optimization

Authors: Thong Thanh Nguyen, Xiaobao Wu, Xinshuai Dong, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our framework consistently produces higher-performing neural topic models in terms of topic coherence, topic diversity, and downstream performance.
Researcher Affiliation Academia 1Institute of Data Science (IDS), National University of Singapore (NUS), Singapore, 2Nanyang Technological University (NTU), Singapore, 3Carnegie Mellon University (CMU), USA,
Pseudocode Yes Algorithm 1 Setwise Contrastive Neural Topic Model as Multi-Objective Optimization.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We adopt popular benchmark datasets spanning various domains, vocabulary sizes, and document lengths for experiments: (i) 20Newsgroups (20NG) (Lang, 1995); (ii) IMDb (Maas et al., 2011); (iii) Wikitext-103 (Wiki) (Merity et al., 2016); (iv) AG News (Zhang et al., 2015),
Dataset Splits No For AG News, the paper states 'whose size is 30000 and 1900 for training and testing subsets, respectively', indicating train/test splits. However, it does not explicitly state validation splits for any of the datasets used in the main experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions) used for the experiments.
Experiment Setup Yes In Table 8, we denote hyperparameter details of our neural topic models, i.e. learning rate η, batch size B, and the temperature τ for the Info NCE loss. For training execution, the hyperparameters vary with respect to the dataset. Table 8: Hyperparameter Settings for Neural Topic Model Training. Hyperparameter 20NG IMDb Wiki T = 50 T = 200 T = 50 T = 200 T = 50 T = 200 sample set size K 4 4 3 3 4 4 permutation matrix size P 8 8 8 8 8 8 temperature τ 0.2 0.2 0.2 0.2 0.2 0.2 learning rate η 0.002 0.002 0.002 0.002 0.001 0.002 batch size B 200 200 200 200 500 500