Learning Useful Representations of Recurrent Neural Network Weight Matrices

Authors: Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct empirical analyses and comparisons across the different encoder architectures using these datasets, showing which encoders are more effective.
Researcher Affiliation Academia 1The Swiss AI Lab IDSIA, USI & SUPSI 2AI Initiative, KAUST.
Pseudocode No The paper describes methods and architectures but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We release the first two model zoo datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits. 1https://github.com/vincentherrmann/ rnn-weights-representation-learning
Open Datasets Yes To evaluate the methods described and foster further research, we develop and release two model zoo datasets for RNNs. ... 1https://github.com/vincentherrmann/ rnn-weights-representation-learning
Dataset Splits Yes The datasets are divided into training, validation, and out-of-distribution (OOD) test splits, with tasks in each split being non-overlapping.
Hardware Specification Yes We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award.
Software Dependencies No The paper mentions the Adam W optimizer and a learning rate schedule, but it does not specify software versions for programming languages, libraries, or other dependencies needed to reproduce the experiments.
Experiment Setup Yes The hyperparameters of these encoders are selected to ensure a comparable number of parameters across all models. Each encoder generates a 16-dimensional representation z. An LSTM with two layers functions as the emulator Aξ. The conditioning of Aξ on an RNN fθ is implemented by incorporating a linear projection of the corresponding representation z to the BOS token of the input sequence of Aξ. More details and hyperparameters can be found in Appendix D.