Correcting Forecasts with Multifactor Neural Attention
Authors: Matthew Riemer, Aditya Vempaty, Flavio Calmon, Fenno Heath, Richard Hull, Elham Khabiri
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9% relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. |
| Researcher Affiliation | Industry | Matthew Riemer MDRIEMER@US.IBM.COM Aditya Vempaty AVEMPAT@US.IBM.COM Flavio P. Calmon FDCALMON@US.IBM.COM Fenno F. Heath III THEATH@US.IBM.COM Richard Hull HULL@US.IBM.COM Elham Khabiri EKHABIRI@US.IBM.COM IBM T.J. Watson Research Center, NY, USA |
| Pseudocode | No | The paper provides mathematical equations and a process flow diagram (Figure 2), but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions using 'two years of transaction data from 107 stores and 20 commodity classes of one of the largest retailers in America', 'historical weather information from 16 stations', 'Eventful.com', and 'random 10% of all english tweets on Twitter', but does not provide any specific links, DOIs, repositories, or formal citations for public access to the datasets used in their experiments. |
| Dataset Splits | Yes | In all of our experiments we used the same 93 stores for training and 14 stores for validation. Our neural network models were all trained with Stochastic Gradient Descent (SGD) until convergence on the validation set. Hyperparameters are selected based on a grid search over the validation set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions) used for the experiments. |
| Experiment Setup | Yes | In all of our experiments we used the same 93 stores for training and 14 stores for validation. Our neural network models were all trained with Stochastic Gradient Descent (SGD) until convergence on the validation set. Hyperparameters are selected based on a grid search over the validation set. In practice none of our neural network models find it useful to have large hidden sizes and are generally optimal between 10 and 100 units. We train all of our models first over a year of data with full parameter tuning, and then after each passing 3 months initialize with the old model and update the model based on the updating training set... |