Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Uncovering Causality from Multivariate Hawkes Integrated Cumulants

Authors: Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-François Muzy

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, we show, on numerical experiments, that our approach is indeed very robust with respect to the shape of the kernels and gives appealing results on the Meme Tracker database and on financial order book data.
Researcher Affiliation Collaboration Massil Achab EMAIL Centre de Mathématiques Appliquées, Ecole polytechnique, Palaiseau, France Emmanuel Bacry EMAIL Centre de Recherche en Mathématique de la Décision, Université Paris-Dauphine, Paris, France Centre de Mathématiques Appliquées, Ecole polytechnique, Palaiseau, France Stéphane Gaïffas EMAIL Laboratoire de Probabilités et Modèles Aléatoires, Université Paris-Diderot, Paris, France Iacopo Mastromatteo EMAIL Research-Execution, Capital Fund Management, Paris, France Jean-François Muzy EMAIL Laboratoire Sciences Pour l Environnement, Université de Corse, Corte, France
Pseudocode Yes Algorithm 1 Non Parametric Hawkes Cumulant method Input: N t Output: b G 1: Estimate bΛi, b Cij, b Kiij from Eqs. (11, 12, 13) 2: Design e L(R) using the computed estimators. 3: Minimize numerically e L(R) so as to obtain b R 4: Return b G = Id b R 1.
Open Source Code Yes An efficient implementation of this algorithm with Tensor Flow, see Abadi et al. (2016), is available on Git Hub: https://github.com/achab/nphc.
Open Datasets Yes Meme Tracker dataset. We use events of the most active sites from the Meme Tracker dataset2. This dataset contains the publication times of articles in many websites/blogs from August 2008 to April 2009, and hyperlinks between posts. We extract the top 100 and the top 200 media sites with the largest number of documents, with about 7 million of events. We name Meme Tracker100 the 100-dimensional dataset, and Meme Tracker200 the 200-dimensional one. We use the links to trace the flow of information and establish an estimated ground truth for the matrix G. Indeed, when an hyperlink j appears in a post in website i, the link j can be regarded as a direct ancestor of the event. Then, Eq. (2) shows gij can be estimated by Ni j T /Nj T = #{links j i}/Nj T . 2. https://www.memetracker.org/data.html
Dataset Splits No The paper mentions simulated datasets, Meme Tracker datasets (Meme Tracker100, Meme Tracker200), and financial order book data. While it describes the characteristics and sources of these datasets, it does not explicitly provide training/test/validation splits, percentages, or methodology for partitioning the data.
Hardware Specification Yes We ran multi-processed versions of the baseline methods on 56 cores, to decrease the computing time.
Software Dependencies No An efficient implementation of this algorithm with Tensor Flow, see Abadi et al. (2016), is available on Git Hub: https://github.com/achab/nphc. The paper mentions TensorFlow but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes The parameter γ is set to 1/2 on the three blocks as well, but we set three very different β0, β1 and β2 from one block to the other, with ratio βi+1/βi = 10 and β0 = 0.1. The number of events is roughly equal to 105 on average over the nodes. We used M = 10 basis functions for both ODE and GC algorithms, and L = 50 quadrature points for WH.