MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Authors: Nicolas Menet, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations show that MIMOConv achieves 2 4 speedup at an accuracy delta within [+0.68, 3.18]% compared to Wide Res Net CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2 4 inputs at once while maintaining a high average accuracy within a [ 1.07, 3.43]% delta on the long range arena benchmark. |
| Researcher Affiliation | Collaboration | Nicolas Menet1,2 menetn@ethz.ch Michael Hersche1,2 her@zurich.ibm.com Geethan Karunaratne1 kar@zurich.ibm.com Luca Benini2 lbenini@iis.ee.ethz.com Abu Sebastian1 ase@zurich.ibm.com Abbas Rahimi1 abr@zurich.ibm.com 1IBM Research Zurich, 2ETH Zurich |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets. |
| Open Datasets | Yes | Empirical evaluations on CIFAR10 and CIFAR100 show that a MIMOConv built on a Wide Res Net-28-10 [11]...We evaluate MIMOFormer on five tasks from LRA [14]...Figure 4 compares MIMOConv with Data Mux [13] on the MNIST dataset...Finally, we tested MIMOConv on the SVHN dataset. |
| Dataset Splits | No | The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode. Appendix E provides more details on the trade-off between fast and slow mode training. (This describes a training strategy for a dynamic model, not a general train/validation/test dataset split.) The paper does not explicitly provide specific train/validation/test dataset split percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as library or solver names and their exact versions. |
| Experiment Setup | Yes | All experiments are repeated five times with a different random seed...We set the number of feature maps after the first convolutional layer to D=64...The function ϕ in the self-attention block consists of an R=256 dimensional projection and a Re LU activation...The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode...To stabilize training in the case of N=4, we implemented a curriculum training procedure where the number of superpositions is reduced to N =N/2 during a warmup phase (1/6th of the training steps). |