Spatial Mixture-of-Experts
Authors: Nikoli Dryden, Torsten Hoefler
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several benchmark datasets with SMOEs and conduct extensive ablation studies of SMOE design decisions ( 3). |
| Researcher Affiliation | Academia | Nikoli Dryden ETH Zürich ndryden@ethz.ch Torsten Hoefler ETH Zürich htor@inf.ethz.ch |
| Pseudocode | No | The paper describes the SMOE layer and training processes conceptually and mathematically but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/spcl/smoe. |
| Open Datasets | Yes | Weather Bench [78], ENS-10 dataset [8], Image Net-1k [24]. The paper also states: "Most datasets we use are publicly available." |
| Dataset Splits | Yes | We use the data subset suggested by Rasp et al. [78] at 5.625 resolution (32 64 grid points) and train on data from 1979 2015, validate on 2016, and report test results for 2017 2018. |
| Hardware Specification | Yes | All results were run using Py Torch [73] version 1.11 on a large cluster with 16 GB V100 GPUs. |
| Software Dependencies | Yes | All results were run using Py Torch [73] version 1.11 on a large cluster with 16 GB V100 GPUs. |
| Experiment Setup | Yes | All models were trained with batch size 32, Adam [54] with a learning rate of 0.001 (decayed by 10 after no validation improvement for 15 epochs), and early stopping after no improvement for 30 epochs. |