MOMENT: A Family of Open Time-series Foundation Models

Authors: Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning.
Researcher Affiliation Academia 1Auton Lab, Robotics Institute, Carnegie Mellon University, Pittsburgh, USA 2University of Pennsylvania, Philadelphia, USA.
Pseudocode No No pseudocode or clearly labeled algorithm block was found in the paper. Method steps are described in prose.
Open Source Code Yes Pre-trained models (Auton Lab/MOMENT-1-large) and Time Series Pile (Auton Lab/Timeseries-PILE) are available on https://huggingface.co/Auton Lab.
Open Datasets Yes We compiled The Time series Pile, a large collection of publicly available data from diverse domains, ranging from healthcare to engineering to finance. The Time Series Pile comprises of over 5 public time series databases, from several diverse domains for pre-training and evaluation (Tab. 11). Pre-trained models (Auton Lab/MOMENT-1-large) and Time Series Pile (Auton Lab/Timeseries-PILE) are available on https://huggingface.co/Auton Lab.
Dataset Splits Yes Minimizing data contamination using careful train-test splitting. We carefully split each dataset into disjoint training, validation, and test splits, based on splits specified by data creators. When these splits are not available, we randomly sample 60% of the data for training, 10% for validation, and 30% for testing.
Hardware Specification Yes All models were trained and evaluated on a computing cluster consisting of 128 AMD EPYC 7502 CPUs, 503 GB of RAM, and 8 NVIDIA RTX A6000 GPUs each with 49 Gi B RAM. All MOMENT variants were trained on a single A6000 GPU (with any data or model parallelism).
Software Dependencies No The paper mentions several libraries used ('Time-Series-Library', 'universal-computation', 'Anomaly-Transformer', 'VUS', 'tsad-model-selection', 'One-Fits-All', 'Statsforecast') but does not specify their version numbers, which is necessary for reproducible ancillary software description.
Experiment Setup Yes Pre-training Setup. We pre-train three different sizes of MOMENT, roughly corresponding to the sizes of encoders in T5-Small, Base, and Large. Specifically, the Base (Small, Large) model uses a 12 (6, 24) layer Transform with hidden dimensions of size D = 768 (512, 1024), 12 (8, 16) attention heads, and feed-forward networks of size 3072 (2048, 4096)... All models take an input time series of length T = 512, breaking it into N = 64 disjoint patches of length P = 8. We mask 30% of the patches uniformly at random during pre-training. We use the Adam optimizer with weight decay (Loshchilov & Hutter, 2019) with = 0.05, b1 = 0.9, b2 = 0.999. We clip the gradient at 5.0, train models using a batch size of 2048, and use cosine learning rate schedule with initial and final learning rates of 1e 4 and 1e 5, respectively... We train all models for 2 epochs.