VQ-TR: Vector Quantized Attention for Time Series Forecasting
Authors: Kashif Rasul, Andrew Bennett, Pablo Vicente, Umang Gupta, Hena Ghonia, Anderson Schneider, Yuriy Nevmyvaka
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this comparison, we find that VQ-TR performs better or comparably to all other methods while being computationally efficient. |
| Researcher Affiliation | Collaboration | Kashif Rasul, Andrew Bennett, Pablo Vicente, Anderson Schneider & Yuriy Nevmyvaka Morgan Stanley, New York, USA kashif.rasul@gmail.com Umang Gupta USC, Los Angeles, USA Hena Ghonia Université de Montréal, Montréal, Canada |
| Pseudocode | Yes | Section D.6, titled "VQ-TR IMPLEMENTATION DETAILS", provides Python code for various components of the VQ-TR model, including `FeedForward`, `Attention`, `VQAttention`, and `VQTrModel` classes, demonstrating the structured implementation. |
| Open Source Code | No | The full code will be published on acceptance, and hyperparameter details are provided in D.3. Full complete details for running these experiments will be available with the code release. |
| Open Datasets | Yes | We use the following open datasets: Exchange (Lai et al., 2018), Solar (Lai et al., 2018), Elecricity3, Traffic4, Taxi5, and Wikipedia6 preprocessed exactly as in Salinas et al. (2019a). Footnotes 3, 4, 5, 6 provide URLs for Electricity, Traffic, Taxi, Wikipedia. |
| Dataset Splits | No | The paper discusses training data (Dtrain) and testing (Dtest) and mentions using context/prediction windows, but it does not specify a distinct validation split (e.g., percentages or methodology) for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | The experiments were performed on a single Tesla V100S GPU with 32GB of RAM. |
| Software Dependencies | No | Section D.6 provides code snippets using PyTorch modules (e.g., `torch`, `torch.nn`, `torch.nn.functional`) and `vector_quantize_pytorch`, but specific version numbers for these libraries or for Python itself are not mentioned. |
| Experiment Setup | Yes | We use two encoder layers and six decoder layers, i.e., N = 2 and M = 6. We use J = 25 codebook vectors and train with a batch size of 256 for 20 epochs using the Adam (Kingma and Ba, 2015) optimizer with default parameters and a learning rate of 0.001. At inference time, we sample S = 100 times for each time point and feed these samples in parallel via the batch dimension autoregressively through the decoder to produce the reported metrics. |