reproducibilityindex.ai

Big Learning Expectation Maximization

Authors: Yulai Cong, Sijia Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through simulated experiments, we empirically show that the Big Learn-EM is capable of delivering the optimal with high probability; comparisons on benchmark clustering datasets further demonstrate its effectiveness and advantages over existing techniques.
Researcher Affiliation	Academia	Yulai Cong*, Sijia Li Sun Yat-sen University yulaicong@gmail.com, lisijia57@163.com
Pseudocode	Yes	Algorithm 1: Big Learning Expectation Maximization
Open Source Code	Yes	The code is available at https://github.com/Yulai Cong/Big-Learning Expectation-Maximization.
Open Datasets	Yes	To validate the effectiveness of the Big Learn-EM in real-world clustering applications, we conduct comprehensive experiments on diverse clustering datasets, including Connect-4, Covtype, Glass, Letter, Pendigits, Satimage, Seismic, Svmguide2, and Vehicle (see Appendix B for details). We follow the experimental setup in Cai et al. (2022) and conduct an experiment on the Fashion MNIST dataset.
Dataset Splits	No	The paper describes the use of 'test joint KL divergence' but does not specify the explicit percentages or sample counts for training, validation, and test splits needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'scipy.stats.ortho group' in Algorithm 1, implying Python and SciPy usage, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Algorithm 1: Input: Training data, the number K of mixture components, probabilities [P1, P2] for joint and marginal matchings, and the number W of local updates. The training objective in (12) also mentions a hyper-parameter γ and η > 0 is a small constant (11).