reproducibilityindex.ai

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

Authors: Huy Nguyen, Pedram Akbarian, Nhat Ho

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct a simulation study to empirically validate our theoretical results. In this section, we perform numerical experiments to empirically confirm the theoretical convergence rates of maximum likelihood estimation (MLE) in both standard and activation dense-to-sparse gating Mo E models. ... The graphs in Figure 1 illustrate that the empirical convergence rates of the MLE b Gn to the true mixing measure G under different Voronoi metrics.
Researcher Affiliation	Academia	Huy Nguyen 1 Pedram Akbarian 2 Nhat Ho 1 1Department of Statistics and Data Sciences 2Department of Electrical and Computer Engineering, The University of Texas at Austin. Correspondence to: Huy Nguyen <huynm@utexas.edu>.
Pseudocode	No	The paper describes the Expectation-Maximization (EM) algorithm in Appendix D, detailing the E-step and M-step in paragraph form, but it does not provide structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not provide any statement or link regarding the release of open-source code for the described methodology.
Open Datasets	No	We conducted experiments using synthetic datasets generated with the true mixing measure G = P2 i=1 exp(β 0i/τ )δ(β 1i,τ ,a i ,b i ,ν i ) of order k = 2. The paper mentions using synthetic datasets but does not provide access information (link, DOI, citation, or repository) for a publicly available or open dataset.
Dataset Splits	No	The paper states it uses “synthetic datasets” for “numerical experiments” with varying “sample size n,” but it does not provide specific details on how these datasets are split into training, validation, or test sets (e.g., percentages, counts, or references to predefined splits).
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or specific computer specifications) used for running its experiments.
Software Dependencies	No	The paper describes the algorithms used (e.g., “Expectation-Maximization (EM) algorithm”), but it does not list any specific software components with version numbers (e.g., programming languages, libraries, or solvers) used for implementation.
Experiment Setup	Yes	Furthermore, it is noteworthy to highlight that we choose the convergence criterion as ϵ = 10 6 and execute a maximum of 1000 iterations for the EM algorithm, with an 100 iterations for the Iteratively Reweighted Least Squares (IRLS) algorithm at each EM iteration, employing a learning rate of η = 0.01. The values corresponding to the true parameters are detailed in Table 2.