MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Authors: Nicolas Menet, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations show that MIMOConv achieves 2 4 speedup at an accuracy delta within [+0.68, 3.18]% compared to Wide Res Net CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2 4 inputs at once while maintaining a high average accuracy within a [ 1.07, 3.43]% delta on the long range arena benchmark.
Researcher Affiliation Collaboration Nicolas Menet1,2 menetn@ethz.ch Michael Hersche1,2 her@zurich.ibm.com Geethan Karunaratne1 kar@zurich.ibm.com Luca Benini2 lbenini@iis.ee.ethz.com Abu Sebastian1 ase@zurich.ibm.com Abbas Rahimi1 abr@zurich.ibm.com 1IBM Research Zurich, 2ETH Zurich
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets.
Open Datasets Yes Empirical evaluations on CIFAR10 and CIFAR100 show that a MIMOConv built on a Wide Res Net-28-10 [11]...We evaluate MIMOFormer on five tasks from LRA [14]...Figure 4 compares MIMOConv with Data Mux [13] on the MNIST dataset...Finally, we tested MIMOConv on the SVHN dataset.
Dataset Splits No The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode. Appendix E provides more details on the trade-off between fast and slow mode training. (This describes a training strategy for a dynamic model, not a general train/validation/test dataset split.) The paper does not explicitly provide specific train/validation/test dataset split percentages or counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (such as exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers, such as library or solver names and their exact versions.
Experiment Setup Yes All experiments are repeated five times with a different random seed...We set the number of feature maps after the first convolutional layer to D=64...The function ϕ in the self-attention block consists of an R=256 dimensional projection and a Re LU activation...The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode...To stabilize training in the case of N=4, we implemented a curriculum training procedure where the number of superpositions is reduced to N =N/2 during a warmup phase (1/6th of the training steps).