HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding
Authors: Yi.shi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. The statistics of these datasets are presented in Table 1. |
| Researcher Affiliation | Academia | Yishi Xu, Dongsheng Wang, Bo Chen , Ruiying Lu, Zhibin Duan National Laboratory of Radar Signal Processing, Xidian University, Xi an, China xuyishi@stu.xidian.edu.cn, bchen@mail.xidian.edu.cn Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, USA mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Knowledge-Guided Topic Taxonomy Mining Input: mini-batch size B, number of layers T, adjacent matrix A built from concept taxonomy. Initialize the variational network parameters Ωand the word and topic embeddings {α(l)}L l=0; while not converged do... |
| Open Source Code | Yes | Our code is available at https://github.com/NoviceStone/HyperMiner |
| Open Datasets | Yes | We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. |
| Dataset Splits | No | Concretely, with the default training/test split of each dataset, we first train a topic model on the training set, and then the trained model is used to extract features θ of all test documents. |
| Hardware Specification | No | The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]' in the checklist, but the main text does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud provider instances used for experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies beyond a general mention of Python 3.8 in the author checklist, and no other key libraries or solvers with their versions are specified in the main text. |
| Experiment Setup | Yes | The embedding dimension for embedded topic models is set as 50. ... τ is the temperature parameter. ... λ is the hyper-parameter used to control the impact of the regularization term... Input: mini-batch size B... |