Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Extremal graphical modeling with latent variables via convex optimization
Authors: Sebastian Engelke, Armeen Taeb
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We highlight the improved performances of our approach on synthetic and real data. Keywords: conditional independence, extreme value theory, latent variable model, multivariate Pareto distribution, sparsity. 5. Experimental demonstrations. In our numerical experiments, we use eglatent as a model selection procedure and perform a second refitting step on the selected model structure to estimate the model parameters; see Appendix I for details. Code to reproduce our results can be found at https://github. com/sebastian-engelke/extremal_latent_learning. 5.1 Synthetic simulations. We illustrate the utility of our method for recovering the subgraph among the observed variables and the number of latent variables on synthetic data. We compare the performance of our eglatent method to eglearn by Engelke et al. (2022c) for learning extremal graphical models. ... Figure 3 summarizes the performance of the methods on 50 independent trials for the different sample sizes and different numbers of latent variables. ... 5.2 Real data application. We apply our latent H usler Reiss model to analyze large flight delays. |
| Researcher Affiliation | Academia | Sebastian Engelke EMAIL Research Center for Statistics, University of Geneva. Armeen Taeb EMAIL Department of Statistics, University of Washington. |
| Pseudocode | No | The paper describes the methodology and optimization procedures mathematically and in natural language, but does not include any explicitly labeled pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our eglatent method is implemented in the R package graphical Extremes (Engelke et al., 2022a) and all numerical results and figures can be reproduced using the code on https://github.com/sebastian-engelke/extremal_latent_learning. |
| Open Datasets | Yes | We use a data set from the R package graphical Extremes (Engelke et al., 2022a) with p = 29 airports in the southern U.S. shown in the left panel of Figure 5. |
| Dataset Splits | Yes | To compare the different model fits and to select the optimal value for the tuning parameter λn, we must compute the likelihood of the fitted models on an independent validation set. To this end, we split the data chronologically into five equally large folds and perform crossvalidation by leaving one fold out (validation data) and fitting on the remaining four folds (training data). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | Our eglatent method is implemented in the R package graphical Extremes (Engelke et al., 2022a). While this specifies the primary software package, it does not provide specific version numbers for R or any other ancillary libraries or solvers used in the computational environment. |
| Experiment Setup | Yes | In this synthetic example, we generated 2000 approximate observations from an extremal graphical model with h = 2 latent variables and a cycle graph among p = 30 observed variables, and fitted both methods for different values of the regularization parameters; see Section 5.1.1 for details on the setup. ... When deploying our eglatent estimator in (9), we fix γ = 4 to a reasonable default value; In Appendix J.2, we demonstrate the robustness of our results to different values of γ. Concerning the regularization parameter λn, which also appears in the eglearn method, in both methods, it is chosen either by validation likelihood on a separate dataset of size n or by an oracle approach maximizing the F-score for the sub-graph among observed variables. ... We report here the results for the exceedance threshold of be q = 0.90 (i.e., 1 k/n = 0.90) resulting in k = 360 marginal exceedances for the computation of the empirical variogram ˆΓO; see Section 3.3.1. |