Spatial Mixture-of-Experts

Authors: Nikoli Dryden, Torsten Hoefler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on several benchmark datasets with SMOEs and conduct extensive ablation studies of SMOE design decisions ( 3).
Researcher Affiliation Academia Nikoli Dryden ETH Zürich ndryden@ethz.ch Torsten Hoefler ETH Zürich htor@inf.ethz.ch
Pseudocode No The paper describes the SMOE layer and training processes conceptually and mathematically but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/spcl/smoe.
Open Datasets Yes Weather Bench [78], ENS-10 dataset [8], Image Net-1k [24]. The paper also states: "Most datasets we use are publicly available."
Dataset Splits Yes We use the data subset suggested by Rasp et al. [78] at 5.625 resolution (32 64 grid points) and train on data from 1979 2015, validate on 2016, and report test results for 2017 2018.
Hardware Specification Yes All results were run using Py Torch [73] version 1.11 on a large cluster with 16 GB V100 GPUs.
Software Dependencies Yes All results were run using Py Torch [73] version 1.11 on a large cluster with 16 GB V100 GPUs.
Experiment Setup Yes All models were trained with batch size 32, Adam [54] with a learning rate of 0.001 (decayed by 10 after no validation improvement for 15 epochs), and early stopping after no improvement for 30 epochs.