Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data

Authors: Zhiyuan Chen, Bing Liu

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results using product reviews from 50 domains demonstrate the effectiveness of the proposed approach. This section evaluates the proposed LTM model and compares it with four state-of-the-art baselines:
Researcher Affiliation Academia Zhiyuan Chen CZYUANACM@GMAIL.COM Department of Computer Science, University of Illinois at Chicago Bing Liu LIUB@CS.UIC.EDU Department of Computer Science, University of Illinois at Chicago
Pseudocode Yes Algorithm 1 Prior Topics Generation(D), Algorithm 2 LTM(Dt, S), Algorithm 3 Knowledge Mining(At, S)
Open Source Code Yes The dataset and the code are publically available at the authors websites.
Open Datasets Yes We have created a large dataset containing 50 review collections from 50 product domains crawled from Amazon.com. Each domain has 1,000 (1K) reviews. The dataset and the code are publically available at the authors websites.
Dataset Splits No The paper describes "Test Settings" (Setting 1 and Setting 2) for evaluation and discusses using domains for prior knowledge and testing, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, memory amounts, or types of computing clusters used for running the experiments.
Software Dependencies No The paper mentions models like LDA, but it does not provide a list of specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch, or specific libraries) used in their implementation or experiments.
Experiment Setup Yes For all models, posterior estimates of latent variables were taken with a sampling lag of 20 iterations in the post burn-in phase (first 200 iterations for burn-in) with 2,000 iterations in total. The parameters of all topic models are set as α = 1, β = 0.1, T = 15. For parameters of LTM, the top 15 words of each topic were used to represent the topic in the topic matching process and also frequent itemset mining. The minimum support threshold is empirically set to min(5, 0.4 #Trans). The parameter π in Algorithm 3 is empirically set to 7.0. The parameter µ in Equation 4 is set to 0.3, which determines the extent of promotion of words in a pk-set using the GPU model.