Do RNN and LSTM have Long Memory?

Authors: Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li, Guangjian Tian

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section reports several numerical experiments. We first compare the models using time series forecasting tasks on four long memory datasets and one short memory dataset. Then, we investigate the effect of the model parameter K on the forecasting performance. Lastly, we apply the proposed models to two sentiment analysis tasks.
Researcher Affiliation Collaboration 1Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China 2Huawei Noah s Ark Lab, Hong Kong, China.
Pseudocode No Explanation: The paper provides mathematical equations and descriptions of the model structures (e.g., in Section 3.1 for MRNN) but does not include formal pseudocode or an algorithm block.
Open Source Code Yes Our implementation in Py Torch is available at https: //github.com/huawei-noah/noah-research/ tree/master/m RNN-m LSTM.
Open Datasets Yes Metro interstate traffic volume The raw dataset contains hourly Interstate 94 Westbound traffic volume for MN Do T ATR station 301, roughly midway between Minneapolis and St Paul, MN, obtained from MN Department of Transportation (UCI). We convert it to de-seasoned daily data with length 1860 (1400 + 200 + 259). UCI. UCI Machine Learning Repository metro interstate traffic volume data set. https: //archive.ics.uci.edu/ml/datasets/ Metro+Interstate+Traffic+Volume, 2019. Accessed: 2019-12-28.
Dataset Splits Yes We split the datasets into training, validation and test sets, and report their lengths below using notation (ntrain + nval + ntest). MSE is the target loss function for training. We perform one-step rolling forecasts and calculate test RMSE, MAE, and MAPE. ARFIMA series We generated a series of length 4001 (2000+1200+800) using the model (1 0.7B+0.4B2)(1 B)0.4Yt = (1 0.2B)εt with obvious long memory effect.
Hardware Specification No Explanation: The paper does not provide specific details about the hardware used, such as exact GPU or CPU models, memory specifications, or processor types.
Software Dependencies No All the networks are implemented in Py Torch.
Experiment Setup Yes We use the Adam algorithm with learning rate 0.01 for optimization. The optimization is stopped when the loss function drops by less than 10 5 or has been increasing for 100 steps or has reached 1000 steps in total. The learned model is chosen to be the one with the smallest loss on the validation set.