reproducibilityindex.ai

Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms

Authors: Ashok Makkuva, Pramod Viswanath, Sreeram Kannan, Sewoong Oh

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois at Urbana Champaign, IL, USA 2Allen School of Computer Science & Engineering, University of Washington, Seattle, USA 3Department of Electrical Engineering, University of Washington, Seattle, USA.
Pseudocode	Yes	Algorithm 1 Learning the regressors... Algorithm 2 Learning the gating parameter
Open Source Code	Yes	Codes are available at this repository Mo E codes.
Open Datasets	Yes	To highlight the generalizability of our algorithm, in Appendix H.2 of the supplement, we compare the performance of our algorithm to that of the standard approaches on a variety of real world datasets. References include: Brooks, T., Pope, D., and Marcolini., A. Airfoil self-noise and prediction. Technical report, NASA, 1989. URL https://archive.ics.uci.edu/ ml/datasets/Airfoil+Self-Noise. Liu, Y.-C. and Yeh, I.-C. Using mixture design and neural networks to build stock selection decision support systems. Neural Computing and Applications, 28(3): 521 535, 2017. doi: 10.1007/s00521-015-2090-x. URL https://archive.ics.uci.edu/ml/ datasets/Stock+portfolio+performance. Yeh, I.-C. Modeling of strength of high performance concrete using artiﬁcial neural networks. Cement and Concrete Research, 28(12):1797 1808, 1998. URL https: //archive.ics.uci.edu/ml/datasets/ Concrete+Compressive+Strength.
Dataset Splits	No	The paper describes generating synthetic data with parameters like n=2000 or n=8000 and d=10, and also mentions using real-world datasets, but it does not specify explicit training, validation, or test splits (e.g., percentages or counts) for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using the "Orth-ALS package by (Sharan & Valiant, 2017)" but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	For the experiments, we consider the similar setting as before with k = 2, d = 10, σ = 0.1 and the gating parameter w is drawn uniformly from S9 without the orthogonality restriction. We let xi i.i.d. N(0, Id). We choose n = 2000... We let the number of mixture components be k = 3 and k = 4. We let x N(0, Id) and the gating parameters are drawn uniformly from S9... n = 8000, d = 10, σ = 0.5.