Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Authors: Nicolas Menet, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations show that MIMOConv achieves 2 4 speedup at an accuracy delta within [+0.68, 3.18]% compared to Wide Res Net CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2 4 inputs at once while maintaining a high average accuracy within a [ 1.07, 3.43]% delta on the long range arena benchmark. |
| Researcher Affiliation | Collaboration | Nicolas Menet1,2 EMAIL Michael Hersche1,2 EMAIL Geethan Karunaratne1 EMAIL Luca Benini2 EMAIL Abu Sebastian1 EMAIL Abbas Rahimi1 EMAIL 1IBM Research Zurich, 2ETH Zurich |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets. |
| Open Datasets | Yes | Empirical evaluations on CIFAR10 and CIFAR100 show that a MIMOConv built on a Wide Res Net-28-10 [11]...We evaluate MIMOFormer on five tasks from LRA [14]...Figure 4 compares MIMOConv with Data Mux [13] on the MNIST dataset...Finally, we tested MIMOConv on the SVHN dataset. |
| Dataset Splits | No | The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode. Appendix E provides more details on the trade-off between fast and slow mode training. (This describes a training strategy for a dynamic model, not a general train/validation/test dataset split.) The paper does not explicitly provide specific train/validation/test dataset split percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers, such as library or solver names and their exact versions. |
| Experiment Setup | Yes | All experiments are repeated five times with a different random seed...We set the number of feature maps after the first convolutional layer to D=64...The function ϕ in the self-attention block consists of an R=256 dimensional projection and a Re LU activation...The models are trained on 80% of the batches in fast mode and on 20% of the batches in slow mode...To stabilize training in the case of N=4, we implemented a curriculum training procedure where the number of superpositions is reduced to N =N/2 during a warmup phase (1/6th of the training steps). |