Demystifying Softmax Gating Function in Gaussian Mixture of Experts

Authors: Huy Nguyen, TrungTin Nguyen, Nhat Ho

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this appendix, we conduct a simulation study to empirically validate our theoretical results on the convergence rates of maximum likelihood estimation (MLE) in the softmax gating Gaussian mixture of experts established in Theorem 1 and Theorem 2.
Researcher Affiliation Academia Huy Nguyen Trung Tin Nguyen Nhat Ho Department of Statistics and Data Sciences, The University of Texas at Austin Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK {huynm, minhnhat}@utexas.edu, trung-tin.nguyen@inria.fr
Pseudocode No The paper describes the EM algorithm and its variations in prose but does not provide a structured pseudocode block or algorithm listing.
Open Source Code No All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. The paper mentions the language and environment used but does not provide a link or explicit statement for code release.
Open Datasets No We generate observations Y from the conditional density g G (Y |X) of softmax gating Gaussian mixture of experts model in equation (1). The data is simulated based on a defined model, not sourced from a publicly available dataset with concrete access information.
Dataset Splits No The paper describes generating synthetic data and defining sample sizes (e.g., "40 samples of size n for each setting, given 200 different choices of sample size n between 102 and 105") but does not specify train/validation/test splits or cross-validation settings.
Hardware Specification No All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. This description is too general and lacks specific hardware details like GPU/CPU models or memory.
Software Dependencies No All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. While a Python version is given, no specific versions of libraries, frameworks (e.g., PyTorch, TensorFlow), or other solvers are listed.
Experiment Setup Yes We choose the convergence criteria ϵ = 10 6 and 2000 maximum EM iterations. Our goal is to illustrate the theoretical properties of the estimator b Gn. Therefore, we have initialized the EM algorithm in a favourable way. More specifically, we first randomly partitioned the set {1, . . . , k} into k index sets J1, . . . , Jk , each containing at least one point, for any given k and k and for each replication. Finally, we sampled β 1j (resp. a j, b j, σ j ) from a unique Gaussian distribution centered on β 1t (resp. a t , b t , σ t ), with vanishing covariance so that j Jt.