Learning Overcomplete HMMs

Authors: Vatsal Sharan, Sham M. Kakade, Percy S. Liang, Gregory Valiant

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed some simulations to understand how the length of cycles in the transition matrix and the probability mass assigned to short cycles affects the condition number of the likelihood matrix A; recall that the condition number of A determines the stability of the method of moments approach. We take the number of hidden states n = 128, and let P128 be a cycle on the n hidden states (as in Fig. 1a). Let Pc be a union of short cycles of length c on the n states (refer to Fig. 2b for an example). We take the transition matrix to be T = ϵPc + (1 ϵ)P128 for different values of c and ϵ. Fig. 3a shows that the condition number of A becomes worse and hence learning requires more samples if the cycles are shorter in length, and if more probability mass is assigned to the short cycles, hinting that our conditions are perhaps not be too stringent.
Researcher Affiliation Academia Vatsal Sharan Stanford University vsharan@stanford.edu Sham Kakade University of Washington sham@cs.washington.edu Percy Liang Stanford University pliang@cs.stanford.edu Gregory Valiant Stanford University valiant@stanford.edu
Pseudocode Yes More details about tensor decomposition and the method of moments approach to learning HMMs can be found in Appendix A. We refer the reader to Appendix A for more details, specifically Algorithm 1.
Open Source Code No The paper does not provide any statements or links indicating that open-source code for the methodology is available.
Open Datasets No The paper describes simulations but does not specify the use of any publicly available or open datasets with access information. It details how the simulated data/models were constructed rather than acquired from a public source.
Dataset Splits No The paper describes the setup of simulated HMMs for analysis but does not mention any training, validation, or test dataset splits.
Hardware Specification No The paper mentions 'simulations' but does not provide any specific details about the hardware used to conduct these simulations.
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility.
Experiment Setup Yes We take the number of hidden states n = 128, and let P128 be a cycle on the n hidden states (as in Fig. 1a). Let Pc be a union of short cycles of length c on the n states (refer to Fig. 2b for an example). We take the transition matrix to be T = ϵPc + (1 ϵ)P128 for different values of c and ϵ. ... Define Gd as the adjacency matrix of a directed regular graph with degree d. We take the transition matrix T = ϵGd + (1 ϵd)P128.