Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

Authors: Saber Salehkaleybar, AmirEmad Ghassami, Negar Kiyavash, Kun Zhang

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic data and real-world data show the effectiveness of our proposed algorithm for learning causal models.
Researcher Affiliation Academia Saber Salehkaleybar EMAIL Department of Electrical Engineering Sharif University of Technology, Tehran, Iran Amir Emad Ghassami EMAIL Department of ECE University of Illinois at Urbana-Champaign Urbana, IL 61801 Negar Kiyavash EMAIL College of Management of Technology Ecole Polytechnique F ed erale de Lausanne (EPFL) Kun Zhang EMAIL Department of Philosophy Carnegie Mellon University Pittsburgh, PA 15213
Pseudocode Yes Algorithm 1 1: Input: Collection of the sets deso(Vi), 1 i po. 2: Run an over-complete ICA algorithm over observed variables Vo and obtain matrix B . 3: for i = 1 : pr do 4: Ii = {k|[ B :,i]k = 0} 5: for j = 1 : po do 6: if Ii = deso(Vj) then 7: [ˆBo]:,j = B :,i/[ B :,i]j 8: end if 10: end for 11: Output: ˆBo
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes We considered the daily closing prices of the following world stock indicies from 10/12/2012 to 10/12/2018, obtained from Yahoo financial database: Dow Jones Industrial Average (DJI) in USA, Nikkei 225 (N225) in Japan, Euronext 100 (N100) in Europe, Hang Seng Index (HSI) in Hong Kong, and the Shanghai Stock Exchange Composite Index (SSEC) in China.
Dataset Splits Yes First, for the causal graph in Figure 1, we generated 1000 samples of observed variables V1 and V2... In order to estimate the number of columns of B , we held out 250 of samples for model selection.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions several algorithms such as RICA, lv Li NGAM, Direct-Li NGAM, and FCI, but it does not provide specific version numbers for any of these software dependencies or libraries.
Experiment Setup Yes First, for the causal graph in Figure 1, we generated 1000 samples of observed variables V1 and V2 where nonzero entries of matrix A is equal to 0.9. We utilized the Reconstruction ICA (RICA) algorithm (Le et al., 2011) to solve the over-complete ICA problem as follows: ... parameter λ controls the cost of penalty term. We estimated matrix B by UΣ1/2Z where Z is the optimal solution of the above optimization problem. In order to estimate the number of columns of B , we held out 250 of samples for model selection. More specifically, we solved the over-complete ICA problem for different number of columns, evaluated the fitness of each model by computing the objective function of RICA over the hold-out set, and selected the model with minimum cost. In order to check whether an entry is equal to zero, we used the bootstrapping method (Efron and Tibshirani, 1994), which generates 10 bootstrap samples by sampling with replacement from training data. For each bootstrap sample, we executed RICA algorithm to obtain an estimation of B . ... Afterwards, we used a t-test with confidence level of 95% to check whether an entry is equal to zero from the bootstrap samples.