reproducibilityindex.ai

VQ-TR: Vector Quantized Attention for Time Series Forecasting

Authors: Kashif Rasul, Andrew Bennett, Pablo Vicente, Umang Gupta, Hena Ghonia, Anderson Schneider, Yuriy Nevmyvaka

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this comparison, we find that VQ-TR performs better or comparably to all other methods while being computationally efficient.
Researcher Affiliation	Collaboration	Kashif Rasul, Andrew Bennett, Pablo Vicente, Anderson Schneider & Yuriy Nevmyvaka Morgan Stanley, New York, USA kashif.rasul@gmail.com Umang Gupta USC, Los Angeles, USA Hena Ghonia Université de Montréal, Montréal, Canada
Pseudocode	Yes	Section D.6, titled "VQ-TR IMPLEMENTATION DETAILS", provides Python code for various components of the VQ-TR model, including `FeedForward`, `Attention`, `VQAttention`, and `VQTrModel` classes, demonstrating the structured implementation.
Open Source Code	No	The full code will be published on acceptance, and hyperparameter details are provided in D.3. Full complete details for running these experiments will be available with the code release.
Open Datasets	Yes	We use the following open datasets: Exchange (Lai et al., 2018), Solar (Lai et al., 2018), Elecricity3, Traffic4, Taxi5, and Wikipedia6 preprocessed exactly as in Salinas et al. (2019a). Footnotes 3, 4, 5, 6 provide URLs for Electricity, Traffic, Taxi, Wikipedia.
Dataset Splits	No	The paper discusses training data (Dtrain) and testing (Dtest) and mentions using context/prediction windows, but it does not specify a distinct validation split (e.g., percentages or methodology) for hyperparameter tuning or early stopping.
Hardware Specification	Yes	The experiments were performed on a single Tesla V100S GPU with 32GB of RAM.
Software Dependencies	No	Section D.6 provides code snippets using PyTorch modules (e.g., `torch`, `torch.nn`, `torch.nn.functional`) and `vector_quantize_pytorch`, but specific version numbers for these libraries or for Python itself are not mentioned.
Experiment Setup	Yes	We use two encoder layers and six decoder layers, i.e., N = 2 and M = 6. We use J = 25 codebook vectors and train with a batch size of 256 for 20 epochs using the Adam (Kingma and Ba, 2015) optimizer with default parameters and a learning rate of 0.001. At inference time, we sample S = 100 times for each time point and feed these samples in parallel via the batch dimension autoregressively through the decoder to produce the reported metrics.