Dynamic Local Regret for Non-convex Online Forecasting

Authors: Sergul Aydore, Tianhao Zhu, Dean P. Foster

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using a real-world dataset we show that our time-smoothed approach yields several benefits when compared with state-of-the-art competitors: results are more stable against new data; training is more robust to hyperparameter selection; and our approach is more computationally efficient than the alternatives. We provide extensive experiments using a real-world data set to support our claims.
Researcher Affiliation Collaboration Sergul Aydore Department of ECE Stevens Institute of Technology Hoboken, NJ USA sergulaydore@gmail.com Tianhao Zhu Department of ECE Stevens Institute of Technology Hoboken, NJ USA romeo.zhuth@gmail.com Dean Foster Amazon New York, NY USA foster@amazon.com
Pseudocode Yes Algorithm 1 Static Time-Smoothed Stochastic Gradient Descent (STS-SGD) and Algorithm 2 Dynamic Exponentially Time-Smoothed Stochastic Gradient Descent (DTS-SGD) are presented.
Open Source Code Yes All of our results can be reproduced using the code in https://github.com/Timbasa/Dynamic_Local_Regret_ for_Non-convex_Online_Forecasting_Neur IPS2019.
Open Datasets Yes We use the data from GEFCom2014 [Barta et al., 2017] for our experiments. It is a public dataset released for a competition in 2014.
Dataset Splits No The paper specifies training and test data splits based on time periods but does not explicitly mention or define a separate validation set for hyperparameter tuning.
Hardware Specification No The paper mentions "GPU seconds" in relation to computation time but does not specify any particular GPU models, CPU models, memory details, or other specific hardware used for experiments.
Software Dependencies No The paper mentions using an "LSTM Model" and "RNNs" but does not provide specific version numbers for any software dependencies, libraries, or frameworks used.
Experiment Setup Yes Training: During the update, we allow only one pass to the data, which means that the epoch number is set to 1. In order to make learning curves smoother, we adjust the learning rate at each update t so that ηt η/ t where η is the initial value for the learning rate. In our experiments, we use 1, 3, 5, 9 for the value of η. Our model contains two LSTM layers and three fully connected linear layers where each represents one of the three quantiles. The input to our LSTM model is 48 44 where 48 is hours in two days. The output is the prediction of three quantiles of next day s values.