Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Authors: Huy Nguyen, Pedram Akbarian, Nhat Ho
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct a simulation study to empirically validate our theoretical results. In this section, we perform numerical experiments to empirically confirm the theoretical convergence rates of maximum likelihood estimation (MLE) in both standard and activation dense-to-sparse gating Mo E models. ... The graphs in Figure 1 illustrate that the empirical convergence rates of the MLE b Gn to the true mixing measure G under different Voronoi metrics. |
| Researcher Affiliation | Academia | Huy Nguyen 1 Pedram Akbarian 2 Nhat Ho 1 1Department of Statistics and Data Sciences 2Department of Electrical and Computer Engineering, The University of Texas at Austin. Correspondence to: Huy Nguyen <huynm@utexas.edu>. |
| Pseudocode | No | The paper describes the Expectation-Maximization (EM) algorithm in Appendix D, detailing the E-step and M-step in paragraph form, but it does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of open-source code for the described methodology. |
| Open Datasets | No | We conducted experiments using synthetic datasets generated with the true mixing measure G = P2 i=1 exp(β 0i/τ )δ(β 1i,τ ,a i ,b i ,ν i ) of order k = 2. The paper mentions using synthetic datasets but does not provide access information (link, DOI, citation, or repository) for a publicly available or open dataset. |
| Dataset Splits | No | The paper states it uses “synthetic datasets” for “numerical experiments” with varying “sample size n,” but it does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages, counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or specific computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper describes the algorithms used (e.g., “Expectation-Maximization (EM) algorithm”), but it does not list any specific software components with version numbers (e.g., programming languages, libraries, or solvers) used for implementation. |
| Experiment Setup | Yes | Furthermore, it is noteworthy to highlight that we choose the convergence criterion as ϵ = 10 6 and execute a maximum of 1000 iterations for the EM algorithm, with an 100 iterations for the Iteratively Reweighted Least Squares (IRLS) algorithm at each EM iteration, employing a learning rate of η = 0.01. The values corresponding to the true parameters are detailed in Table 2. |