Do RNN and LSTM have Long Memory?
Authors: Jingyu Zhao, Feiqing Huang, Jia Lv, Yanjie Duan, Zhen Qin, Guodong Li, Guangjian Tian
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section reports several numerical experiments. We first compare the models using time series forecasting tasks on four long memory datasets and one short memory dataset. Then, we investigate the effect of the model parameter K on the forecasting performance. Lastly, we apply the proposed models to two sentiment analysis tasks. |
| Researcher Affiliation | Collaboration | 1Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China 2Huawei Noah s Ark Lab, Hong Kong, China. |
| Pseudocode | No | Explanation: The paper provides mathematical equations and descriptions of the model structures (e.g., in Section 3.1 for MRNN) but does not include formal pseudocode or an algorithm block. |
| Open Source Code | Yes | Our implementation in Py Torch is available at https: //github.com/huawei-noah/noah-research/ tree/master/m RNN-m LSTM. |
| Open Datasets | Yes | Metro interstate traffic volume The raw dataset contains hourly Interstate 94 Westbound traffic volume for MN Do T ATR station 301, roughly midway between Minneapolis and St Paul, MN, obtained from MN Department of Transportation (UCI). We convert it to de-seasoned daily data with length 1860 (1400 + 200 + 259). UCI. UCI Machine Learning Repository metro interstate traffic volume data set. https: //archive.ics.uci.edu/ml/datasets/ Metro+Interstate+Traffic+Volume, 2019. Accessed: 2019-12-28. |
| Dataset Splits | Yes | We split the datasets into training, validation and test sets, and report their lengths below using notation (ntrain + nval + ntest). MSE is the target loss function for training. We perform one-step rolling forecasts and calculate test RMSE, MAE, and MAPE. ARFIMA series We generated a series of length 4001 (2000+1200+800) using the model (1 0.7B+0.4B2)(1 B)0.4Yt = (1 0.2B)εt with obvious long memory effect. |
| Hardware Specification | No | Explanation: The paper does not provide specific details about the hardware used, such as exact GPU or CPU models, memory specifications, or processor types. |
| Software Dependencies | No | All the networks are implemented in Py Torch. |
| Experiment Setup | Yes | We use the Adam algorithm with learning rate 0.01 for optimization. The optimization is stopped when the loss function drops by less than 10 5 or has been increasing for 100 steps or has reached 1000 steps in total. The learned model is chosen to be the one with the smallest loss on the validation set. |