Learning Useful Representations of Recurrent Neural Network Weight Matrices
Authors: Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct empirical analyses and comparisons across the different encoder architectures using these datasets, showing which encoders are more effective. |
| Researcher Affiliation | Academia | 1The Swiss AI Lab IDSIA, USI & SUPSI 2AI Initiative, KAUST. |
| Pseudocode | No | The paper describes methods and architectures but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release the first two model zoo datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits. 1https://github.com/vincentherrmann/ rnn-weights-representation-learning |
| Open Datasets | Yes | To evaluate the methods described and foster further research, we develop and release two model zoo datasets for RNNs. ... 1https://github.com/vincentherrmann/ rnn-weights-representation-learning |
| Dataset Splits | Yes | The datasets are divided into training, validation, and out-of-distribution (OOD) test splits, with tasks in each split being non-overlapping. |
| Hardware Specification | Yes | We also thank NVIDIA Corporation for donating a DGX-1 as part of the Pioneers of AI Research Award. |
| Software Dependencies | No | The paper mentions the Adam W optimizer and a learning rate schedule, but it does not specify software versions for programming languages, libraries, or other dependencies needed to reproduce the experiments. |
| Experiment Setup | Yes | The hyperparameters of these encoders are selected to ensure a comparable number of parameters across all models. Each encoder generates a 16-dimensional representation z. An LSTM with two layers functions as the emulator Aξ. The conditioning of Aξ on an RNN fθ is implemented by incorporating a linear projection of the corresponding representation z to the BOS token of the input sequence of Aξ. More details and hyperparameters can be found in Appendix D. |