Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Coresets for Time Series Clustering

Authors: Lingxiao Huang, K Sudhir, Nisheeth Vishnoi

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically assess the performance of our coresets with synthetic data.
Researcher Affiliation Academia Lingxiao Huang Tsinghua University K. Sudhir Yale University Nisheeth K. Vishnoi Yale University
Pseudocode Yes Algorithm 1: CRGMM: Coreset construction for GMM time series clustering
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets No We generate two sets of synthetic data all with 250K observations with different number of entities N and observations per individual Ti: (i) N = Ti = 500 for all i [N] and (ii) N = 200, Ti = 1250 for all i [N]. ... Given these draws of parameters, we generate a GMM time-series dataset as follows: For each i [N], draw l [k] given ฮฑ. Then, for all t [Ti] draw eit Rd with covariance matrix ฮฃ(l) and autocorrelation matrix ฮ›(l) and compute xit = ยต(l) + eit Rd.
Dataset Splits No The paper does not specify exact training, validation, or test splits for its synthetic datasets. It refers to evaluating the model on the 'full dataset' and 'coresets', which are sampled subsets.
Hardware Specification No The experiments are conducted with Py Charm IDE on a computer with 8-core CPU and 32 GB RAM.
Software Dependencies No The paper mentions 'Py Charm IDE' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup No The paper describes the model generation parameters (d, k, ฮป) and the EM algorithm used for optimization, but it does not provide specific training hyperparameters such as learning rates, batch sizes, number of epochs, or detailed convergence criteria.