Probabilistic Transformer For Time Series Analysis

Authors: Binh Tang, David S Matteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present our experiment results on two tasks, namely, time series forecasting and human motion prediction. These tasks are often studied independently, despite being almost identical as conditional prediction problems. Table 1: Test set CRPSsum of time series forecasting models (lower is better). The means and standard deviations are computed over five runs using different random seeds.
Researcher Affiliation Academia Binh Tang Department of Statistics and Data Science Cornell University Ithaca, NY 14850 bvt5@cornell.edu David S. Matteson Department of Statistics and Data Science Cornell University Ithaca, NY 14850 matteson@cornell.edu
Pseudocode Yes Starting with a learnable, context-agnostic representation w0, we recursively update wt using a stochastic sample from pθ(zt | z1:t 1, x1:C) and th e positional embedding for the current time step t. The generating process for the time step t can be summarized by the following pseudocode: wt = Layer Norm(wt 1 + Attention(wt 1, w1:t 1, w1:t 1)) (6) ˆwt = Layer Norm( wt + Attention( wt, h1:C, h1:C)) (7) zt = Sample(N(zt; MLP( ˆwt), Softplus(MLP( ˆwt)))) (8) wt = Layer Norm( ˆwt + MLP(zt) + Position(t)), (9)
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. We are also commited to open source our code upon publication.
Open Datasets Yes Following the experiment setup in [72, 73, 75], we evaluate our models and multiple competitive baselines on five popular public datasets: SOLAR, ELECTRICITY, TRAFFIC, TAXI, and WIKIPEDIA.
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appdendix D in supplemental material.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appdendix D in supplemental material.
Software Dependencies No The paper mentions that hyperparameters and training processes are detailed in Appendix D, but it does not specify any software dependencies with version numbers in the provided text.
Experiment Setup Yes We use 8-head attentions and 2-layers MLPs to parametrize the generative and inference models. The stochastic latent variables zt are 16-dimensional while the hidden representations wt are in R128. Our probabilistic transformers for SOLAR and ELECTRICITY have one stochastic layer while those for the other datasets of higher dimensional observations employ two layers.