Training Linear Finite-State Machines

Authors: Arash Ardakani, Amir Ardakani, Warren Gross

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce a method that can train a multi-layer FSM-based network where FSMs are connected to every FSM in the previous and the next layer. We show that the proposed FSM-based network can synthesize multi-input complex functions such as 2D Gabor filters and can perform non-sequential tasks such as image classifications on stochastic streams with no multiplication since FSMs are implemented by look-up tables only. As the second application of FSM-based networks, we perform an image classification task on the MNIST dataset. To demonstrate the capability of our FSM-based model in processing temporal data, we perform the CLLM task on Penn Treebank [32], War & Peace [33] and Linux kernel [33] corpora where the performance is measured in terms of bit per character (BPC).
Researcher Affiliation Academia Arash Ardakani, Amir Ardakani, and Warren J. Gross Department of Electrical and Computer Engineering, Mc Gill University, Montreal, Canada {arash.ardakani, amir.ardakani}@mail.mcgill.ca warren.gross@mcgill.ca
Pseudocode Yes The details of training algorithm are provided in Appendix A. The details of the training algorithm are provided in Appendix C.
Open Source Code No The paper does not provide an unambiguous statement or a link regarding the public release of the source code for the described methodology.
Open Datasets Yes As the second application of FSM-based networks, we perform an image classification task on the MNIST dataset [23]. To demonstrate the capability of our FSM-based model in processing temporal data, we perform the CLLM task on Penn Treebank [32], War & Peace [33] and Linux kernel [33] corpora where the performance is measured in terms of bit per character (BPC).
Dataset Splits No The paper refers to 'test set' for datasets like MNIST and Penn Treebank, but does not provide specific details on the training, validation, and test splits (e.g., percentages, sample counts, or explicit references to a predefined splitting methodology).
Hardware Specification Yes For instance, a long short-term memory (LSTM) [20] of size 1000, which is a popular variant of RNNs, cannot fit into the Ge Force GTX 1080 Ti for the step sizes beyond 2000. Figure 4 shows the memory usage of the LSTM model versus the FSM-based model and their corresponding test accuracy performance on the Ge Force GTX 1080 Ti for different numbers of time steps when both models have the same number of weights and use the batch size of 100 for the CLLM on the Penn Treebank dataset [32].
Software Dependencies No The paper mentions 'Adam as the optimizer' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes We used the MSE as our loss function and Adam as the optimizer with the learning rate of 0.1. We set the number of hidden nodes of all the models to 1000 (i.e., dh = 1000) for the Penn Treebank corpus and 500 (i.e., dh = 500) for the War & Peace and the Linux Kernel corpora to obtain the simulation results reported in Table 2. use the batch size of 100 for the CLLM on the Penn Treebank dataset [32].