Coresets for Time Series Clustering

Authors: Lingxiao Huang, K Sudhir, Nisheeth Vishnoi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically assess the performance of our coresets with synthetic data.
Researcher Affiliation Academia Lingxiao Huang Tsinghua University K. Sudhir Yale University Nisheeth K. Vishnoi Yale University
Pseudocode Yes Algorithm 1: CRGMM: Coreset construction for GMM time series clustering
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets No We generate two sets of synthetic data all with 250K observations with different number of entities N and observations per individual Ti: (i) N = Ti = 500 for all i [N] and (ii) N = 200, Ti = 1250 for all i [N]. ... Given these draws of parameters, we generate a GMM time-series dataset as follows: For each i [N], draw l [k] given α. Then, for all t [Ti] draw eit Rd with covariance matrix Σ(l) and autocorrelation matrix Λ(l) and compute xit = µ(l) + eit Rd.
Dataset Splits No The paper does not specify exact training, validation, or test splits for its synthetic datasets. It refers to evaluating the model on the 'full dataset' and 'coresets', which are sampled subsets.
Hardware Specification No The experiments are conducted with Py Charm IDE on a computer with 8-core CPU and 32 GB RAM.
Software Dependencies No The paper mentions 'Py Charm IDE' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup No The paper describes the model generation parameters (d, k, λ) and the EM algorithm used for optimization, but it does not provide specific training hyperparameters such as learning rates, batch sizes, number of epochs, or detailed convergence criteria.