Large Pre-trained time series models for cross-domain Time series analysis tasks
Authors: Harshavardhan Prabhakar Kamarthi, B. Aditya Prakash
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LPTM on downstream forecasting and classification tasks from multiple domains and observe that LPTM consistently provides performance similar to or better than previous state-of-art models usually under zero-shot evaluation as well as when fine-tuned with lesser training data and compute time. Overall, we also observe that LPTM typically requires less than 80% of training data used by state-of-art baselines to provide similar or better performance. |
| Researcher Affiliation | Academia | Harshavardhan Kamarthi College of Computing Georgia Institute of Technology harsha.pk@gatech.edu B. Aditya Prakash College of Computing Georgia Institute of Technology badityap@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1: Adaptive Segmentation Module |
| Open Source Code | Yes | The code for implementation of LPTM and datasets are provided at anonymized link3 and hyperparameters are discussed in the Appendix. |
| Open Datasets | Yes | Epidemics: We use a large number of epidemic time-series aggregated by Project Tycho (van Panhuis et al., 2018)... Electricity: We use ETT electricity datasets (ETT1 and ETT2) collected from (Zhou et al., 2021)... Traffic Datasets: We use 2 datasets related to traffic speed prediction. PEMS-Bays (PEM-B) and METR-LA (Li et al., 2017)... M4 competition time-series: We also used the 3003 time-series of M4 forecasting competition (Makridakis and Hibon, 2000)... Motion and behavioral sensor datasets: We use the set of sensor datasets extracted from UEA archive (Bagnall et al., 2018) and UCI Machine learning repository (Asuncion and Newman, 2007). |
| Dataset Splits | Yes | We use the default 12/4/4 train/val/test split and use the train split for pre-training as well. We use an 80-20 train-test split similar to Chowdhury et al. (2022). |
| Hardware Specification | Yes | The model is run on Intel Xeon CPU with 64 cores and 128 GB RAM. We use a single A100 GPU with 80GB memory. |
| Software Dependencies | No | The paper mentions software components like GRU, Transformer, and Adam optimizer but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | For GRU we use a single hidden layer of 50 hidden units. Dimension of v is also 50. The transformer architecture consists of 10 layers with 8 attention heads each. For both pre-training and fine-tuning, we used the Adam optimizer with a learning rate of 0.001. For RANDMASK, we found the optimal γ = 0.4, and for LASTMASK γ = 0.2 was optimal. |