reproducibilityindex.ai

Coresets for Time Series Clustering

Authors: Lingxiao Huang, K Sudhir, Nisheeth Vishnoi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically assess the performance of our coresets with synthetic data.
Researcher Affiliation	Academia	Lingxiao Huang Tsinghua University K. Sudhir Yale University Nisheeth K. Vishnoi Yale University
Pseudocode	Yes	Algorithm 1: CRGMM: Coreset construction for GMM time series clustering
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	We generate two sets of synthetic data all with 250K observations with different number of entities N and observations per individual Ti: (i) N = Ti = 500 for all i [N] and (ii) N = 200, Ti = 1250 for all i [N]. ... Given these draws of parameters, we generate a GMM time-series dataset as follows: For each i [N], draw l [k] given α. Then, for all t [Ti] draw eit Rd with covariance matrix Σ(l) and autocorrelation matrix Λ(l) and compute xit = µ(l) + eit Rd.
Dataset Splits	No	The paper does not specify exact training, validation, or test splits for its synthetic datasets. It refers to evaluating the model on the 'full dataset' and 'coresets', which are sampled subsets.
Hardware Specification	No	The experiments are conducted with Py Charm IDE on a computer with 8-core CPU and 32 GB RAM.
Software Dependencies	No	The paper mentions 'Py Charm IDE' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	No	The paper describes the model generation parameters (d, k, λ) and the EM algorithm used for optimization, but it does not provide specific training hyperparameters such as learning rates, batch sizes, number of epochs, or detailed convergence criteria.