Learning Hidden Markov Models from Pairwise Co-occurrences with Application to Topic Modeling
Authors: Kejun Huang, Xiao Fu, Nicholas Sidiropoulos
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Validation on Synthetic Data", "We show the total variation distance between the ground truth probabilities Pr[Xt+1|Xt] and Pr[Yt|Xt] and their estimations c Pr[Xt+1|Xt] and c Pr[Yt|Xt] using various methods. The result is shown in Figure 4. As we can see, the proposed method indeed works best, obtaining almost perfect recovery when sample size is above 108. |
| Researcher Affiliation | Academia | 1University of Minnesota, Minneapolis, MN 55455 2Oregon State University, Corvallis, OR 97331 3University of Virginia, Charlottesville, VA 22904. |
| Pseudocode | Yes | Algorithm 1 Proposed Algorithm |
| Open Source Code | No | The paper mentions 'The in-line implementation of this tailored Newton s method THETAUPDATE and the detailed derivation can be found in the supplementary material.', but it does not provide an unambiguous statement or link for the open-source code of the entire described methodology. |
| Open Datasets | Yes | On the Reuters21578 data set obtained at (Mimaroglu, 2007) |
| Dataset Splits | No | The paper mentions using the Reuters21578 dataset and synthetic data but does not explicitly provide details about training, validation, or test dataset splits or cross-validation setup. |
| Hardware Specification | No | The paper mentions that 'simulations are conducted in MATLAB using the HMM toolbox' but does not provide any specific hardware details such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'MATLAB', 'HMM toolbox', and 'Tensorlab', but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Fixing N = 100 and K = 20, the transition probabilities are synthetically generated from a random exponential matrix of size K K followed by row-normalization; for the emission probabilities, approximately 50% of the entries in the N K random exponential matrices are set to zero before normalizing the columns... We let the number of HMM realizations go from 106 to 108... initialize M using (Huang et al., 2016a) 2: initialize Θ 1 K(K+1)(I + 11 ) |