On the approximation properties of recurrent encoder-decoder architectures

Authors: Zhong Li, Haotian Jiang, Qianxiao Li

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main results, their consequences and numerical illustrations are presented in Section 4. Our results in this work reveal that the encoder-decoders have a special temporal product structure which is intrinsically different from other sequence modelling architectures. In Figure 2, we train linear encoder-decoder models to learn three relationships of different ranks determined by various decay patterns of singular values, given in (a), (b) and (c). In Figure 3, we perform experiments on the forced Lorentz 96 system (Lorenz, 1996), which parameterises a high-dimensional and nonlinear relationship between input forcing and model states.
Researcher Affiliation Academia Zhong Li School of Mathematical Sciences Peking University li zhong@pku.edu.cn Haotian Jiang Department of Mathematics National University of Singapore e0012663@u.nus.edu Qianxiao Li Department of Mathematics National University of Singapore qianxiao@nus.edu.sg
Pseudocode No The paper does not contain any sections explicitly labeled
Open Source Code No The source code for numerical tests can be made available upon request.
Open Datasets Yes In Figure 3, we perform experiments on the forced Lorenz 96 system (Lorenz, 1996), which parameterises a high-dimensional and nonlinear relationship between input forcing and model states.
Dataset Splits No The paper does not explicitly specify the training, validation, and test dataset splits. It mentions training models (e.g.,
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes Let m = 128 be the hidden dimension, N = 1, 2, . . . , 32 be the size of the coding vector v, we have xs, ot, b O RK, hs, b E, b D, b2 Rm, WE, WD Rm m, b1 RN, WO Rm K, Q Rm N, P RN m. Note that we construct the model with a fixed hidden dimension m but different N, thus only sizes of Q, P, b1, b2 are varying, while sizes of other parameters remain unchanged. We utilise the Adam optimiser and train from Enc Dec(1) to Enc Dec(32). For Enc Dec(1), we use a normal random initialisation, and train for 3000 epoches until a stable error. For Enc Dec(N) with N > 1, we use the parameters trained from Enc Dec(N 1) as the initialisation. For the parameters Q, P, b1, b2, we pad them to match the size of Enc Dec(N) with normal distributions as initialisations.