Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
Authors: Ashok Makkuva, Pramod Viswanath, Sreeram Kannan, Sewoong Oh
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois at Urbana Champaign, IL, USA 2Allen School of Computer Science & Engineering, University of Washington, Seattle, USA 3Department of Electrical Engineering, University of Washington, Seattle, USA. |
| Pseudocode | Yes | Algorithm 1 Learning the regressors... Algorithm 2 Learning the gating parameter |
| Open Source Code | Yes | Codes are available at this repository Mo E codes. |
| Open Datasets | Yes | To highlight the generalizability of our algorithm, in Appendix H.2 of the supplement, we compare the performance of our algorithm to that of the standard approaches on a variety of real world datasets. References include: Brooks, T., Pope, D., and Marcolini., A. Airfoil self-noise and prediction. Technical report, NASA, 1989. URL https://archive.ics.uci.edu/ ml/datasets/Airfoil+Self-Noise. Liu, Y.-C. and Yeh, I.-C. Using mixture design and neural networks to build stock selection decision support systems. Neural Computing and Applications, 28(3): 521 535, 2017. doi: 10.1007/s00521-015-2090-x. URL https://archive.ics.uci.edu/ml/ datasets/Stock+portfolio+performance. Yeh, I.-C. Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Research, 28(12):1797 1808, 1998. URL https: //archive.ics.uci.edu/ml/datasets/ Concrete+Compressive+Strength. |
| Dataset Splits | No | The paper describes generating synthetic data with parameters like n=2000 or n=8000 and d=10, and also mentions using real-world datasets, but it does not specify explicit training, validation, or test splits (e.g., percentages or counts) for these datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "Orth-ALS package by (Sharan & Valiant, 2017)" but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | Yes | For the experiments, we consider the similar setting as before with k = 2, d = 10, σ = 0.1 and the gating parameter w is drawn uniformly from S9 without the orthogonality restriction. We let xi i.i.d. N(0, Id). We choose n = 2000... We let the number of mixture components be k = 3 and k = 4. We let x N(0, Id) and the gating parameters are drawn uniformly from S9... n = 8000, d = 10, σ = 0.5. |