Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes
Authors: Yishi Xu, Jianqiao Sun, Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen, Mingyuan Zhou
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted a wealth of quantitative and qualitative experiments, and the results show that our approach comprehensively outperforms established topic models. |
| Researcher Affiliation | Academia | Yishi Xu , Jianqiao Sun , Yudi Su, Xinyang Liu, Zhibin Duan, Bo Chen , National Key Laboratory of Radar Signal Processing, Xidian University, Xi an, China, 710071 EMAIL, EMAIL Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, TX 78712, USA EMAIL |
| Pseudocode | Yes | In Alg. 1 and Alg. 2, we present the training and meta-testing procedures of our Meta-CETM. |
| Open Source Code | Yes | Our code is available at https://github.com/Novice Stone/Meta-CETM. |
| Open Datasets | Yes | We conducted experiments on four widely used textual benchmark datasets, specifically 20Newsgroups (20NG) [38], Yahoo Answers Topics (Yahoo) [39], DBpedia (DB14) [40], and Web of Science (WOS) [41]. |
| Dataset Splits | No | The paper describes a support set and a query set for each task (80%/20% split) but does not explicitly mention a separate validation set. |
| Hardware Specification | Yes | Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card. |
| Software Dependencies | No | The paper mentions 'spa Cy' and 'gensim package' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For all compared methods, we set the number of topics as 10. And for all NTMs, the hidden layers size of the encoder is set to 300. For all embedding-based topic models, i.e., ETM, MAML-ETM, Meta-Saw ETM and our Meta-CETM, we load pretrained Glo Ve word embeddings [47] as the initialization for a fair comparison. Finally, We train our model using the Adam optimizer [48] with a learning rate of 1 10 2 for 10 epochs on an NVIDIA Ge Force RTX 3090 graphics card. |