IMS-DTM: Incremental Multi-Scale Dynamic Topic Models

Authors: Xilun Chen, K. Selcuk Candan, Maria Luisa Sapino

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental evaluations of the efficiency and effectiveness of the proposed IMS-DTM algorithm. All experiments were conducted in Matlab 2015b using an Intel Core i5-2400 machine with 8GB memory. In these experiments, we set the number, S, of scales to 4. The default parameters for Dirichlet prior for the topics for time epoch t is set to Dt 0.05 K , where Dt is the average document length at time epoch t. We compare the proposed IMS-DTM (we refer to this as multi-past multi-current DTM (MPMC)) with multi-scale dynamic topic model proposed in (Iwata et al. 2010). Since it considers only multiple scales of the past, we refer to this approach as multi-past single-current DTM (MPSC). We also consider a baseline dynamic topic model with singlepast1 and single-current scales (SPSC) proposed in (Blei and Lafferty 2006) and a single-past multi-current (SPMC) approach, implemented based on (Canini, Shi, and Griffiths 2009). In the rest of this section, we refer to our approach as multi-past multi-current approach.
Researcher Affiliation Academia Xilun Chen, K. Selc uk Candan Arizona State University Tempe, AZ, USA xilun.chen@asu.edu and candan@asu.edu Maria Luisa Sapino University of Torino Torino, Italy mlsapino@di.unito.it
Pseudocode Yes Algorithm 1 IMS-DTM Algorithm Input: Streaming corpus D; Number of topics K; Number of time epochs T ; Number of multi scale epochs S; Number of Gibbs sampling iterations iter; Hyperparameter update frequency iter Update; Output: The incremental multi-scale dynamic topic model M;
Open Source Code No The paper does not include an unambiguous statement about releasing the source code for the described methodology or a direct link to a code repository.
Open Datasets Yes NIPS Data. We obtained the NIPS data, representing webbased scientific data streams, from UCI Machine Learning Repository Bag of Words Data Set. NYSK Data. NYSK (New York v. Strauss-Kahn) data set also comes from UCI Machine Learning Repository, representing web-based news data streams, is a collection of English news articles about the case relating to allegations of sexual assault against the former IMF director Dominique Strauss-Kahn in May, 2011. Apple Stock Data. Apple stock data, representing webbased financial data streams, is in the form of numerical time series from 1981 to 2015 from Quandl website.
Dataset Splits No The paper describes the datasets and epoch lengths, but it does not specify explicit training, validation, and testing dataset splits (e.g., percentages, sample counts, or predefined splits with citations) that are needed for reproduction.
Hardware Specification Yes All experiments were conducted in Matlab 2015b using an Intel Core i5-2400 machine with 8GB memory.
Software Dependencies Yes All experiments were conducted in Matlab 2015b using an Intel Core i5-2400 machine with 8GB memory.
Experiment Setup Yes In these experiments, we set the number, S, of scales to 4. The default parameters for Dirichlet prior for the topics for time epoch t is set to Dt 0.05 K , where Dt is the average document length at time epoch t. ... For this data, the default epoch length is 100 documents. The number, K, of latent topics is set to be 50, which is in line with (Iwata et al. 2010) for easy comparison. ... For this data, the default epoch length is 45 documents. ... we set K to 20. ... we set the target number, K, of topics to 5