Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Uncovering Causality from Multivariate Hawkes Integrated Cumulants
Authors: Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-François Muzy
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we show, on numerical experiments, that our approach is indeed very robust with respect to the shape of the kernels and gives appealing results on the Meme Tracker database and on financial order book data. |
| Researcher Affiliation | Collaboration | Massil Achab EMAIL Centre de Mathématiques Appliquées, Ecole polytechnique, Palaiseau, France Emmanuel Bacry EMAIL Centre de Recherche en Mathématique de la Décision, Université Paris-Dauphine, Paris, France Centre de Mathématiques Appliquées, Ecole polytechnique, Palaiseau, France Stéphane Gaïffas EMAIL Laboratoire de Probabilités et Modèles Aléatoires, Université Paris-Diderot, Paris, France Iacopo Mastromatteo EMAIL Research-Execution, Capital Fund Management, Paris, France Jean-François Muzy EMAIL Laboratoire Sciences Pour l Environnement, Université de Corse, Corte, France |
| Pseudocode | Yes | Algorithm 1 Non Parametric Hawkes Cumulant method Input: N t Output: b G 1: Estimate bΛi, b Cij, b Kiij from Eqs. (11, 12, 13) 2: Design e L(R) using the computed estimators. 3: Minimize numerically e L(R) so as to obtain b R 4: Return b G = Id b R 1. |
| Open Source Code | Yes | An efficient implementation of this algorithm with Tensor Flow, see Abadi et al. (2016), is available on Git Hub: https://github.com/achab/nphc. |
| Open Datasets | Yes | Meme Tracker dataset. We use events of the most active sites from the Meme Tracker dataset2. This dataset contains the publication times of articles in many websites/blogs from August 2008 to April 2009, and hyperlinks between posts. We extract the top 100 and the top 200 media sites with the largest number of documents, with about 7 million of events. We name Meme Tracker100 the 100-dimensional dataset, and Meme Tracker200 the 200-dimensional one. We use the links to trace the flow of information and establish an estimated ground truth for the matrix G. Indeed, when an hyperlink j appears in a post in website i, the link j can be regarded as a direct ancestor of the event. Then, Eq. (2) shows gij can be estimated by Ni j T /Nj T = #{links j i}/Nj T . 2. https://www.memetracker.org/data.html |
| Dataset Splits | No | The paper mentions simulated datasets, Meme Tracker datasets (Meme Tracker100, Meme Tracker200), and financial order book data. While it describes the characteristics and sources of these datasets, it does not explicitly provide training/test/validation splits, percentages, or methodology for partitioning the data. |
| Hardware Specification | Yes | We ran multi-processed versions of the baseline methods on 56 cores, to decrease the computing time. |
| Software Dependencies | No | An efficient implementation of this algorithm with Tensor Flow, see Abadi et al. (2016), is available on Git Hub: https://github.com/achab/nphc. The paper mentions TensorFlow but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | The parameter γ is set to 1/2 on the three blocks as well, but we set three very different β0, β1 and β2 from one block to the other, with ratio βi+1/βi = 10 and β0 = 0.1. The number of events is roughly equal to 105 on average over the nodes. We used M = 10 basis functions for both ODE and GC algorithms, and L = 50 quadrature points for WH. |