Large Pre-trained time series models for cross-domain Time series analysis tasks

Authors: Harshavardhan Prabhakar Kamarthi, B. Aditya Prakash

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LPTM on downstream forecasting and classification tasks from multiple domains and observe that LPTM consistently provides performance similar to or better than previous state-of-art models usually under zero-shot evaluation as well as when fine-tuned with lesser training data and compute time. Overall, we also observe that LPTM typically requires less than 80% of training data used by state-of-art baselines to provide similar or better performance.
Researcher Affiliation Academia Harshavardhan Kamarthi College of Computing Georgia Institute of Technology harsha.pk@gatech.edu B. Aditya Prakash College of Computing Georgia Institute of Technology badityap@cc.gatech.edu
Pseudocode Yes Algorithm 1: Adaptive Segmentation Module
Open Source Code Yes The code for implementation of LPTM and datasets are provided at anonymized link3 and hyperparameters are discussed in the Appendix.
Open Datasets Yes Epidemics: We use a large number of epidemic time-series aggregated by Project Tycho (van Panhuis et al., 2018)... Electricity: We use ETT electricity datasets (ETT1 and ETT2) collected from (Zhou et al., 2021)... Traffic Datasets: We use 2 datasets related to traffic speed prediction. PEMS-Bays (PEM-B) and METR-LA (Li et al., 2017)... M4 competition time-series: We also used the 3003 time-series of M4 forecasting competition (Makridakis and Hibon, 2000)... Motion and behavioral sensor datasets: We use the set of sensor datasets extracted from UEA archive (Bagnall et al., 2018) and UCI Machine learning repository (Asuncion and Newman, 2007).
Dataset Splits Yes We use the default 12/4/4 train/val/test split and use the train split for pre-training as well. We use an 80-20 train-test split similar to Chowdhury et al. (2022).
Hardware Specification Yes The model is run on Intel Xeon CPU with 64 cores and 128 GB RAM. We use a single A100 GPU with 80GB memory.
Software Dependencies No The paper mentions software components like GRU, Transformer, and Adam optimizer but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For GRU we use a single hidden layer of 50 hidden units. Dimension of v is also 50. The transformer architecture consists of 10 layers with 8 attention heads each. For both pre-training and fine-tuning, we used the Adam optimizer with a learning rate of 0.001. For RANDMASK, we found the optimal γ = 0.4, and for LASTMASK γ = 0.2 was optimal.