Automatic Segmentation of Data Sequences

Authors: Liangzhe Chen, Sorour E. Amiri, B. Aditya Prakash

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran DASSA on multiple real datasets of varying sizes and it is very effective in finding the time-cut points of the segmentations (in some cases recovering the cut points perfectly) as well as in finding the corresponding changing patterns.
Researcher Affiliation Academia Liangzhe Chen, Sorour E. Amiri, B. Aditya Prakash Department of Computer Science, Virginia Tech. Email: {liangzhe, esorour, badityap}@cs.vt.edu
Pseudocode Yes Algorithm 1 Pseudo-code for DASSA Algorithm 2 Pseudo-code of DAG-ALP
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets No Datasets. DASSA works for general data sequences, hence we collected real world datasets from different domains to test. Tab. 1 shows the content of each data sequence. These sequences contain different data types like age, town id (categorical), sensor observations (real), etc., different timeunits and some of them (like Portland, Ebola) have arbitrary time stamps (a data point can have any time stamp value, and as a result there may be different number of data points at each time stamp).
Dataset Splits No The paper does not explicitly provide details about training, validation, or test dataset splits. It only mentions 'cross validation' in the context of setting parameters, not for data partitioning for model evaluation.
Hardware Specification Yes Our experiments are conducted on a 4 Xeon E7-4850 CPU with 512GB of 1066Mhz main memory and DASSA takes 30m to run on average for our datasets.
Software Dependencies No The paper does not provide specific software dependencies with version numbers used for the implementation of DASSA or its experiments.
Experiment Setup Yes For all the datasets, we set a discretization level k = 10 as it leads to a reasonable running time, and the performance is stable around 10 (k = 5, 15 gives similar results). When constructing the segment-graph in practice, we ignore segments with less than 5% of |D| data values (which is a small fraction of all segments), as they have too few observations, and are not interesting for the final segmentation.