Learning to Discover Sparse Graphical Models
Authors: Eugene Belilovsky, Kyle Kastner, Gael Varoquaux, Matthew B. Blaschko
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations focus on the challenging high dimensional settings in which p > n and consider both synthetic data and real data from genetics and neuroimaging. |
| Researcher Affiliation | Academia | Eugene Belilovsky INRIA Galen University of Paris-Saclay, France eugene.belilovsky@inria.frKyle Kastner MILA Lab University of Montreal, Canada kyle.kastner@umontreal.caGael Varoquaux INRIA Parietal Saclay, France gael.varoquaux@inria.frMatthew B. Blaschko Center for Processing Speech and Images KU Leuven, Belgium matthew.blaschko@esat.kuleuven.be |
| Pseudocode | Yes | Algorithm 1 Training a GGM edge estimator for i {1, .., N} do Sample Gi P(G) Sample Σi P(Σ|G = Gi) Xi {xj N(0, Σi)}n j=1 Construct (Yi, ˆΣi) pair from (Gi, Xi) end for Select Function Class F (e.g. CNN) Optimize: min f F 1 N PN k=1 ˆl(f(ˆΣk), Yk)) |
| Open Source Code | No | The paper does not contain an explicit statement about releasing its source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use the ABIDE dataset (Di Martino et al, 2014), a large scale resting state f MRI dataset. It gathers brain scans from 539 individuals suffering from autism spectrum disorder and 573 controls over 16 sites. |
| Dataset Splits | No | Each network is trained continously with new samples generated until the validation error saturates. For a given precision matrix we generate 5 possible X samples to be used as training data, with a total of approximately 100K training samples used for each network. The networks are optimized using ADAM (Kingma & Ba, 2015) coupled with cross-entropy loss as the objective function (cf. Sec. 2.1). We use batch normalization at each layer. Additionally, we found that using the absolute value of the true partial correlations as labels, instead of hard binary labels, improves results. |
| Hardware Specification | No | We compute the average execution time of our method compared to Graph Lasso and BDGraph on a CPU in Table 4. |
| Software Dependencies | No | We compared our learned estimator against the scikit-learn (Pedregosa et al, 2011) implementation of Graphical Lasso... We used the BDGraph R-package... as well as the R-package rags2ridges (Peeters et al., 2015). |
| Experiment Setup | Yes | We train networks taking in 39, 50, and 500 node graphs. ... In all cases we have 50 feature maps of 3x3 kernels. The 39 and 50 node network with 6 convolutional layers and dk = k + 1. For the 500 node network with 8 convolutional layers and dk = 2k+1. We use Re LU activations. The last layer has 1x1 convolution and a sigmoid outputing a value of 0 to 1 for each edge. ... The networks are optimized using ADAM (Kingma & Ba, 2015) coupled with cross-entropy loss as the objective function (cf. Sec. 2.1). We use batch normalization at each layer. |