Training Linear Finite-State Machines
Authors: Arash Ardakani, Amir Ardakani, Warren Gross
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce a method that can train a multi-layer FSM-based network where FSMs are connected to every FSM in the previous and the next layer. We show that the proposed FSM-based network can synthesize multi-input complex functions such as 2D Gabor filters and can perform non-sequential tasks such as image classifications on stochastic streams with no multiplication since FSMs are implemented by look-up tables only. As the second application of FSM-based networks, we perform an image classification task on the MNIST dataset. To demonstrate the capability of our FSM-based model in processing temporal data, we perform the CLLM task on Penn Treebank [32], War & Peace [33] and Linux kernel [33] corpora where the performance is measured in terms of bit per character (BPC). |
| Researcher Affiliation | Academia | Arash Ardakani, Amir Ardakani, and Warren J. Gross Department of Electrical and Computer Engineering, Mc Gill University, Montreal, Canada {arash.ardakani, amir.ardakani}@mail.mcgill.ca warren.gross@mcgill.ca |
| Pseudocode | Yes | The details of training algorithm are provided in Appendix A. The details of the training algorithm are provided in Appendix C. |
| Open Source Code | No | The paper does not provide an unambiguous statement or a link regarding the public release of the source code for the described methodology. |
| Open Datasets | Yes | As the second application of FSM-based networks, we perform an image classification task on the MNIST dataset [23]. To demonstrate the capability of our FSM-based model in processing temporal data, we perform the CLLM task on Penn Treebank [32], War & Peace [33] and Linux kernel [33] corpora where the performance is measured in terms of bit per character (BPC). |
| Dataset Splits | No | The paper refers to 'test set' for datasets like MNIST and Penn Treebank, but does not provide specific details on the training, validation, and test splits (e.g., percentages, sample counts, or explicit references to a predefined splitting methodology). |
| Hardware Specification | Yes | For instance, a long short-term memory (LSTM) [20] of size 1000, which is a popular variant of RNNs, cannot fit into the Ge Force GTX 1080 Ti for the step sizes beyond 2000. Figure 4 shows the memory usage of the LSTM model versus the FSM-based model and their corresponding test accuracy performance on the Ge Force GTX 1080 Ti for different numbers of time steps when both models have the same number of weights and use the batch size of 100 for the CLLM on the Penn Treebank dataset [32]. |
| Software Dependencies | No | The paper mentions 'Adam as the optimizer' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | We used the MSE as our loss function and Adam as the optimizer with the learning rate of 0.1. We set the number of hidden nodes of all the models to 1000 (i.e., dh = 1000) for the Penn Treebank corpus and 500 (i.e., dh = 500) for the War & Peace and the Linux Kernel corpora to obtain the simulation results reported in Table 2. use the batch size of 100 for the CLLM on the Penn Treebank dataset [32]. |