Clustering Financial Time Series: How Long Is Enough?

Authors: Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Then, we also give a first empirical answer to the much debated question: How long should the time series be? If too short, the clusters found can be spurious; if too long, dynamics can be smoothed out. 5 Empirical rates of convergence
Researcher Affiliation Collaboration Gautier Marti Hellebore Capital Ltd Ecole Polytechnique S ebastien Andler ENS de Lyon Hellebore Capital Ltd Frank Nielsen Ecole Polytechnique LIX UMR 7161 Philippe Donnat Hellebore Capital Ltd Michelin House, London
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes For the simulations, implementation and tutorial available at www.datagrapple.com/Tech, we will consider two models:
Open Datasets No The paper describes using 'simulated time series' based on models and does not provide concrete access information (link, DOI, repository, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper describes generating 'L = 10^3 datasets of N = 265 time series with length T' for simulation, but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits) for a fixed dataset for training, validation, or testing.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For each model, for every T ranging from 10 to 500, we sample L = 103 datasets of N = 265 time series with length T from the model. We count how many times the clustering methodology (here, the choice of an algorithm and a correlation coefficient) is able to recover the underlying clusters defined by the correlation matrix.