HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding

Authors: Yi.shi Xu, Dongsheng Wang, Bo Chen, Ruiying Lu, Zhibin Duan, Mingyuan Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51]. The statistics of these datasets are presented in Table 1.
Researcher Affiliation Academia Yishi Xu, Dongsheng Wang, Bo Chen , Ruiying Lu, Zhibin Duan National Laboratory of Radar Signal Processing, Xidian University, Xi an, China xuyishi@stu.xidian.edu.cn, bchen@mail.xidian.edu.cn Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, USA mingyuan.zhou@mccombs.utexas.edu
Pseudocode Yes Algorithm 1 Knowledge-Guided Topic Taxonomy Mining Input: mini-batch size B, number of layers T, adjacent matrix A built from concept taxonomy. Initialize the variational network parameters Ωand the word and topic embeddings {α(l)}L l=0; while not converged do...
Open Source Code Yes Our code is available at https://github.com/NoviceStone/HyperMiner
Open Datasets Yes We conduct our experiments on four benchmark datasets with various sizes and document lengths, including 20Newsgroups (20NG) [48], Tag My News (TMN) [49], Wiki Text-103 (WIKI) [50], and Reuters Corpus Volume II (RCV2) [51].
Dataset Splits No Concretely, with the default training/test split of each dataset, we first train a topic model on the training set, and then the trained model is used to extract features θ of all test documents.
Hardware Specification No The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes]' in the checklist, but the main text does not provide specific hardware details such as GPU/CPU models, memory amounts, or cloud provider instances used for experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies beyond a general mention of Python 3.8 in the author checklist, and no other key libraries or solvers with their versions are specified in the main text.
Experiment Setup Yes The embedding dimension for embedded topic models is set as 50. ... τ is the temperature parameter. ... λ is the hyper-parameter used to control the impact of the regularization term... Input: mini-batch size B...