DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales
Authors: Brandon Jacques, Zoran Tiganj, Marc Howard, Per B Sederberg
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare Deep SITH to LSTMs and other recent RNNs on several time series prediction and decoding tasks. Deep SITH achieves results comparable to state-of-the-art performance on these problems and continues to perform well even as the delays are increased. |
| Researcher Affiliation | Academia | Brandon G. Jacques Department of Psychology University Of Virginia bgj5hk@virginia.eduZoran Tiganj Department of Computer Science Indiana University ztiganj@iu.eduMarc W. Howard Department of Psychological and Brain Sciences Boston University marc777@bu.eduPer B. Sederberg Department of Psychology University of Virginia pbs5u@virginia.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | We provide all the code for Deep SITH and the subesquent analysis in our github here. |
| Open Datasets | Yes | In the MNIST task [22], handwritten numerical digits can be identiļ¬ed by neural networks with almost 100% accuracy utilizing a convolutional neural network (CNN). |
| Dataset Splits | Yes | The Deep SITH network is trained with a batch size of 64, with a cross entropy loss function, with a training/test split of 80%-20%. |
| Hardware Specification | No | We did not ran anything that could not be ran on a single GPU |
| Software Dependencies | No | The paper mentions "Py Torch machine learning framework [19]" but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Table 1 summarizes the hyperparameters used in the experiments. ... The Deep SITH network is trained with a batch size of 64, with a cross entropy loss function, with a training/test split of 80%-20%. In between each layer we applied batch normalization, and applied a step learning rate annealing after every third of the training epochs (2e-3, 2e-4, 2e-5). |