MC-LSTM: Mass-Conserving LSTM
Authors: Pieter-Jan Hoedt, Frederik Kratzert, Daniel Klotz, Christina Halmich, Markus Holzleitner, Grey S Nearing, Sepp Hochreiter, Guenter Klambauer
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modeling a damped pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. 5. Experiments In the following, we demonstrate the broad applicability and high predictive performance of MC-LSTM in settings where mass conservation is required. |
| Researcher Affiliation | Collaboration | 1ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria 2Google Research, Mountain View, CA, USA 3Institute of Advanced Research in Artificial Intelligence (IARAI). |
| Pseudocode | No | The paper describes the architecture using mathematical equations and a schematic diagram (Figure 1) but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code for the experiments can be found at https:// github.com/ml-jku/mc-lstm |
| Open Datasets | Yes | An ensemble of 10 MC-LSTMs was trained on 10 years of data from 447 basins using the publicly-available CAMELS dataset (Newman et al., 2015; Addor et al., 2017). |
| Dataset Splits | Yes | For training, we sampled sequences of length 100 with two random numbers (between 0 and 0.5) that had to be summed up. For testing, we used 100 000 sequences that were not used during training. (Appendix B.1.1) The training and validation sets were constructed by randomly sampling 10 000 input sequences of length between 1 and 20. (Appendix B.1.2) We used the same split into calibration and validation periods as in (Kratzert et al., 2019b): The period 1980–1999 for calibration and 2000–2009 for validation. (Appendix B.4.2) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For training, we sampled sequences of length 100 with two random numbers (between 0 and 0.5) that had to be summed up. We used batch sizes of 128. We trained for 50 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001. The LSTM had 20 hidden units and was initialized using orthogonal initialization (Saxe et al., 2013). We used a high initial forget gate bias of 1.0 (Gers & Schmidhuber, 2000). (Appendix B.1.1) |