Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Authors: Huy Nguyen, TrungTin Nguyen, Nhat Ho
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this appendix, we conduct a simulation study to empirically validate our theoretical results on the convergence rates of maximum likelihood estimation (MLE) in the softmax gating Gaussian mixture of experts established in Theorem 1 and Theorem 2. |
| Researcher Affiliation | Academia | Huy Nguyen Trung Tin Nguyen Nhat Ho Department of Statistics and Data Sciences, The University of Texas at Austin Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK {huynm, minhnhat}@utexas.edu, trung-tin.nguyen@inria.fr |
| Pseudocode | No | The paper describes the EM algorithm and its variations in prose but does not provide a structured pseudocode block or algorithm listing. |
| Open Source Code | No | All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. The paper mentions the language and environment used but does not provide a link or explicit statement for code release. |
| Open Datasets | No | We generate observations Y from the conditional density g G (Y |X) of softmax gating Gaussian mixture of experts model in equation (1). The data is simulated based on a defined model, not sourced from a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes generating synthetic data and defining sample sizes (e.g., "40 samples of size n for each setting, given 200 different choices of sample size n between 102 and 105") but does not specify train/validation/test splits or cross-validation settings. |
| Hardware Specification | No | All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. This description is too general and lacks specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | All code for our simulation study below was written in Python 3.9.13 on a standard Unix machine. While a Python version is given, no specific versions of libraries, frameworks (e.g., PyTorch, TensorFlow), or other solvers are listed. |
| Experiment Setup | Yes | We choose the convergence criteria ϵ = 10 6 and 2000 maximum EM iterations. Our goal is to illustrate the theoretical properties of the estimator b Gn. Therefore, we have initialized the EM algorithm in a favourable way. More specifically, we first randomly partitioned the set {1, . . . , k} into k index sets J1, . . . , Jk , each containing at least one point, for any given k and k and for each replication. Finally, we sampled β 1j (resp. a j, b j, σ j ) from a unique Gaussian distribution centered on β 1t (resp. a t , b t , σ t ), with vanishing covariance so that j Jt. |