Learning Discrete Latent Variable Structures with Tensor Rank Conditions

Authors: Zhengming Chen, Ruichu Cai, Feng Xie, Jie Qiao, Anpeng Wu, Zijian Li, Zhifeng Hao, Kun Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we aim to establish a general identification criterion for discrete latent structures in cases where latent structures exhibit flexible dependencies, while also developing a simple but robust structure learning algorithm. To achieve this, we explore a tensor rank condition on the contingency tables for an observed variable set Xp, to probe the latent causal structure from observed data. Interestingly, as shown in Fig. 1, we found that the rank of the contingency tables of the joint distribution P(X1, X2) is deeply connected to the support of a variable L (not necessary among X1, X2) that d-separate X1 and X2. By this observation, we first develop a general tensor rank condition for the discrete causal model and show that such a rank is determined by the minimal support of a specific conditional set (not necessary in Xp) that d-separates all variables in Xp. Such findings intrigue the possibility to identify the discrete latent variables structure. We further propose a discrete latent structure model that accommodates more general latent structures and shows that the discrete latent variable structure can be identified locally and iteratively through tensor rank conditions. Subsequently, we present an identification algorithm to complete the identifiability of discrete latent structure models, including the measurement model and the structure model. We theoretically show that under proper causal assumptions, such as faithfulness and the Markov assumption, the measurement model is fully identifiable and the structure model can be identified up to a Markov equivalence class.The contributions of this work are three-fold. (1) We first establish a connection between the tensor rank condition and the graphical patterns in a general discrete causal model, including specific dseparation relations. (2) We then exploit the tensor rank condition to learn the discrete latent variable model, allowing flexible relations between latent variables. (3) We present a structure learning algorithm using tensor rank conditions and demonstrate the effectiveness of the proposed algorithm through simulation studies.
Researcher Affiliation Collaboration Zhengming Chen1,2, Ruichu Cai1, , Feng Xie3, Jie Qiao1, Anpeng Wu2,4, Zijian Li2, Zhifeng Hao1,5, Kun Zhang2,6,1. School of Computer Science, Guangdong University of Technology 2. Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence 3. Department of Applied Statistics, Beijing Technology and Business University 4. Department of Computer Science and Technology, Zhejiang University 5. College of Science, Shantou University, Shantou, Guangdong, China 6. Department of Philosophy, Carnegie Mellon University
Pseudocode Yes Algorithm 1 Finding the causal cluster Input: Data from a set of measured variables XG, and the dimension of latent support r Output: Causal cluster C
Open Source Code No The paper mentions specific implementations used from third-party libraries (e.g., Tetrad Project package, tensorly, pgmpy) but does not provide a statement or link for the authors' own source code for the methodology described in the paper.
Open Datasets Yes For the political efficacy data Aish and Jöreskog (1990), by identifying the support of latent variable to be two, one can identify the causal cluster { NOCARE , TOUCH , INTEREST }, { NOSAY , VOTING , COMPLEX }. These clusters correspond to the latent variables EFFICACY and RESPONSE , respectively. In our implementation, we set the significance level to 0.0015. The result is aligned with the ground truth provided in Jöreskog and Sörbom (1996).For the depress dataset, the ground truth structure Jöreskog and Sörbom (1996) includes three latent factors: Self-esteem, Depression, and Impulsiveness, with a total of 204 samples.
Dataset Splits No The paper mentions sample sizes like {5k, 10k, 50k} for simulated data and specific sample counts for real-world datasets, but it does not specify explicit train/validation/test splits, percentages, or methodology for splitting the data for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies Yes Non-negative CP decomposition: To perform non-negative CP decomposition in our algorithm, we use the implementation from the python package, tensorly 4, and set the maximum iteration parameter to 1000, the cvg criterion parameter to "rec_error".Data Generation: To generate the probability contingency table or conditional probability contingency table for latent variables and observed variables in our simulation studies, we use the implementation from the python package, pgmpy 5.
Experiment Setup Yes In all cases, the data generation process follows the discrete 3PLSM model: (i) we generate the probability contingency table of latent variables in advance, according to different latent structures (e.g., SM1), then (ii) we generate the conditional contingency table of observed variables (condition on their latent parent), and finally (iii) we sample the observed data according to the probability contingency table, where the dimension of latent support r is set to 3 and the dimension of all observed variables support is set to 4, sample size ranged from {5k, 10k, 50k}.In our implementation, the significant levels for testing the rank of the matrix and tensor are set to 0.005 and 0.05, respectively. The coefficient of probability contingency tables is generated randomly, ranging from [0.1, 0.8], constraining the sum of them to be one.