Automatic Segmentation of Data Sequences
Authors: Liangzhe Chen, Sorour E. Amiri, B. Aditya Prakash
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We ran DASSA on multiple real datasets of varying sizes and it is very effective in finding the time-cut points of the segmentations (in some cases recovering the cut points perfectly) as well as in finding the corresponding changing patterns. |
| Researcher Affiliation | Academia | Liangzhe Chen, Sorour E. Amiri, B. Aditya Prakash Department of Computer Science, Virginia Tech. Email: {liangzhe, esorour, badityap}@cs.vt.edu |
| Pseudocode | Yes | Algorithm 1 Pseudo-code for DASSA Algorithm 2 Pseudo-code of DAG-ALP |
| Open Source Code | No | The paper does not provide any explicit statement about releasing open-source code or a link to a code repository for the described methodology. |
| Open Datasets | No | Datasets. DASSA works for general data sequences, hence we collected real world datasets from different domains to test. Tab. 1 shows the content of each data sequence. These sequences contain different data types like age, town id (categorical), sensor observations (real), etc., different timeunits and some of them (like Portland, Ebola) have arbitrary time stamps (a data point can have any time stamp value, and as a result there may be different number of data points at each time stamp). |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits. It only mentions 'cross validation' in the context of setting parameters, not for data partitioning for model evaluation. |
| Hardware Specification | Yes | Our experiments are conducted on a 4 Xeon E7-4850 CPU with 512GB of 1066Mhz main memory and DASSA takes 30m to run on average for our datasets. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers used for the implementation of DASSA or its experiments. |
| Experiment Setup | Yes | For all the datasets, we set a discretization level k = 10 as it leads to a reasonable running time, and the performance is stable around 10 (k = 5, 15 gives similar results). When constructing the segment-graph in practice, we ignore segments with less than 5% of |D| data values (which is a small fraction of all segments), as they have too few observations, and are not interesting for the final segmentation. |