Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mean Estimation of Truncated Mixtures of Two Gaussians: A Gradient Based Approach

Authors: Sai Ganesh Nagarajan, Gerasimos Palaiopanos, Ioannis Panageas, Tushar Vaidya, Samson Yu

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose a gradient based variant of the EM algorithm that has global convergence guarantees when d = 1 and local convergence for d > 1 to the true means. Moreover, the update rule at every iteration is easy to compute which makes the proposed method practical. We also provide numerous experiments to obtain more insights into the effect of truncation on the convergence to the true parameters in high dimensions.
Researcher Affiliation	Academia	Sai Ganesh Nagarajan1, Gerasimos Palaiopanos2, Ioannis Panageas3, Tushar Vaidya4, Samson Yu 5 1 EPFL 2 University of Pittsburgh 3 University of California, Irvine 4 NTU 5 NUS
Pseudocode	Yes	Algorithm 1: Gradient-Truncated EM Output mean estimate µ Initialize λ0, choose ϵ > 0 and set η = α2(B+1) (see Theorems 2, 3, 4 for specific choices of η) While f(λt) 2 > ϵ λt+1 = λt η Eλt,S x T Σ 1 tanh(x T Σ 1λt) Eµ,S x T Σ 1 tanh(x T Σ 1λt)
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper uses a truncated mixture model for simulation rather than a publicly available dataset. It generates synthetic data based on this model for its experiments, therefore, no specific access information for a public dataset is provided.
Dataset Splits	No	The paper does not describe dataset splits for training, validation, or testing. It performs simulations by varying initial points and truncation sets, which is not equivalent to standard data splitting for model training and evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments. It does not mention CPU, GPU, or any other computing specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers used for the experiments. It does not list programming languages, libraries, or solvers with their versions.
Experiment Setup	Yes	Algorithm 1: Gradient-Truncated EM [...] set η = α2(B+1) (see Theorems 2, 3, 4 for specific choices of η) [...] Let the threshold to reach a certain error be defined by ϵ. When we mention the average rates, we consider the number of iterations required to reach within ϵ of the true parameters from a particular initial condition and then take the average number of iterations over 50 randomly chosen starting points. The performance analysis of (Gradient-Truncated EM) on box-truncation sets in 2-dimensions is deferred to the corresponding section of the supplementary material. [...] We let the measure of the truncated set vary and then record the number of iterations required to a threshold of error ϵ = 0.1.