Probabilistic Transformer For Time Series Analysis
Authors: Binh Tang, David S Matteson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present our experiment results on two tasks, namely, time series forecasting and human motion prediction. These tasks are often studied independently, despite being almost identical as conditional prediction problems. Table 1: Test set CRPSsum of time series forecasting models (lower is better). The means and standard deviations are computed over five runs using different random seeds. |
| Researcher Affiliation | Academia | Binh Tang Department of Statistics and Data Science Cornell University Ithaca, NY 14850 bvt5@cornell.edu David S. Matteson Department of Statistics and Data Science Cornell University Ithaca, NY 14850 matteson@cornell.edu |
| Pseudocode | Yes | Starting with a learnable, context-agnostic representation w0, we recursively update wt using a stochastic sample from pθ(zt | z1:t 1, x1:C) and th e positional embedding for the current time step t. The generating process for the time step t can be summarized by the following pseudocode: wt = Layer Norm(wt 1 + Attention(wt 1, w1:t 1, w1:t 1)) (6) ˆwt = Layer Norm( wt + Attention( wt, h1:C, h1:C)) (7) zt = Sample(N(zt; MLP( ˆwt), Softplus(MLP( ˆwt)))) (8) wt = Layer Norm( ˆwt + MLP(zt) + Position(t)), (9) |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See supplemental material. We are also commited to open source our code upon publication. |
| Open Datasets | Yes | Following the experiment setup in [72, 73, 75], we evaluate our models and multiple competitive baselines on five popular public datasets: SOLAR, ELECTRICITY, TRAFFIC, TAXI, and WIKIPEDIA. |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appdendix D in supplemental material. |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appdendix D in supplemental material. |
| Software Dependencies | No | The paper mentions that hyperparameters and training processes are detailed in Appendix D, but it does not specify any software dependencies with version numbers in the provided text. |
| Experiment Setup | Yes | We use 8-head attentions and 2-layers MLPs to parametrize the generative and inference models. The stochastic latent variables zt are 16-dimensional while the hidden representations wt are in R128. Our probabilistic transformers for SOLAR and ELECTRICITY have one stochastic layer while those for the other datasets of higher dimensional observations employ two layers. |